Yes, pretty basic looking mistakes that, from the outside, make many wonder how this got through. Though analyzing the post-mortem makes me think of the MV Dali crashing into the Francis Scott Key bridge in Baltimore: the whole thing started with a single loose wire which set off a cascading failure. CF's situation was similar in a few ways though finding a bad query (and .unwrap() in production code rather than test code) should have been a lot easier to spot.
Have any of the post-mortems addressed if any of the code that led to CloudFlare's outage was generated by AI?
> And CF doesn't have the "...or people will die" safety criticality.
I disagree with that. Just because you can't point to people falling off a bridge into the water doesn't mean that outages of the web at this scale will not lead to fatalities.
OTOH...whether you describe it as regulations, an SLA, or otherwise - "150,000 ton freighter destroys a major bridge and kills people" is a far worse violation of expected behavior than "lots of web sites went down".
I see where people use CF and I actually think that 'lots of websites went down' has the potential these days to in aggregate kill far more people than were killed by the Dali losing control over their helm. The Dali accident could also have been avoided by simply requiring ships with the gross tonnage to do damage to the bridge to have mandatory tugs, and I'm not so sure there is a clean and effective solution for the kind of issues that CF can create.
They're more like 'the shipping industry' than they are like 'a single out of control vessel'. Keep in mind that half of the health care industry or more uses CF to protect their assets.
Have any of the post-mortems addressed if any of the code that led to CloudFlare's outage was generated by AI?