A single DNS race condition brought AWS to its knees
Submitted 2 days ago by cm0002@lemmings.world to technology@lemmy.zip
https://go.theregister.com/feed/www.theregister.com/2025/10/23/amazon_outage_postmortem/
Submitted 2 days ago by cm0002@lemmings.world to technology@lemmy.zip
https://go.theregister.com/feed/www.theregister.com/2025/10/23/amazon_outage_postmortem/
artyom@piefed.social 1 day ago
Why did it take them an entire day to fix “a single DNS race condition”?
aeronmelon@lemmy.world 1 day ago
As with most IT troubleshooting,
Time spent applying the fix: 15 minutes.
Time spent identifying the problem then discovering where it is in the system: 15 hours.
pageflight@piefed.social 1 day ago
There’s a full postmortem from AWS. One piece that stands out to me:
That is, the load that resulted from the initial failure was not something the system was designed to handle, so it had cascading effects / required manual cleanup.
SteleTrovilo@beehaw.org 1 day ago
Because the “them” in your sentence is a rapidly decreasing number of professionals. lemmy.zip/post/51501102
artyom@piefed.social 1 day ago