There are technical solutions to this. You update half your servers, and then if they die you just disconnect them from the network while you fix them and then have your own unaffected servers take up the load. Now yes, this doesn’t get a fixout quickly, but if you update kills your entire system, you’re not going to get the fix out quickly anyway.
Comment on Cloudfare outage post mortem
floquant@lemmy.dbzer0.com 2 months agoTheir motivation is that that file has to change rapidly to respond to threats. If a new botnet pops up and starts generating a lot of malicious traffic, they can’t just let it run for a week
echodot@feddit.uk 2 months ago
floquant@lemmy.dbzer0.com 2 months ago
Congratulations, now your “good” servers are dead from the extra load and you also have a queue of shit to go through once you’re back up, making the problem worse. Running a terabit-scale proxy network isn’t exactly easy, the amount of moving parts interacting with each other is insane. I highly suggest reading some of their postmortems, they’re usually really well written
unexposedhazard@discuss.tchncs.de 2 months ago
How about an hour? 10 minutes? Would have prevented this. I very much doubt that their service is so unstable and flimsy that they need to respond to stuff on such short notice. It would be worthless to their customers if that were true.
SMillerNL@lemmy.world 2 months ago
5 minutes of uninterrupted DDoS traffic from a bot farm would be pretty bad.
ramble81@lemmy.zip 2 months ago
5 hours from an unintentional update is even worse.
SMillerNL@lemmy.world 2 months ago
It wasn’t an unintentional update though, it was an intentional update with a bug.
dafta@lemmy.blahaj.zone 2 months ago
Significantly better than several hours od most of the internet being down.
SMillerNL@lemmy.world 2 months ago
Maybe not updating bot mitigation fast enough would cause an even bigger outage. We don’t know from the outside.