Comment on Cloudfare outage post mortem
echodot@feddit.uk 1 day agoThere are technical solutions to this. You update half your servers, and then if they die you just disconnect them from the network while you fix them and then have your own unaffected servers take up the load. Now yes, this doesn’t get a fixout quickly, but if you update kills your entire system, you’re not going to get the fix out quickly anyway.
floquant@lemmy.dbzer0.com 1 day ago
Congratulations, now your “good” servers are dead from the extra load and you also have a queue of shit to go through once you’re back up, making the problem worse. Running a terabit-scale proxy network isn’t exactly easy, the amount of moving parts interacting with each other is insane. I highly suggest reading some of their postmortems, they’re usually really well written