Comment

Comment on No Google Cloud for 12 hours due to power outage.

Dave@lemmy.nz ⁨9⁩ ⁨months⁩ ago

But how does this happen? Surely Google has the ability to make highly available systems that are resistant to power going out at one of the three locations (as per the article).

source

Sort:hotnew top

jmcs@discuss.tchncs.de ⁨9⁩ ⁨months⁩ ago
That doesn’t help if they have software that assumes it can reach all sites. I remember a few years ago AWS had a EC2 outage in eu-central-1 because of the Availability Zones went down and the service that allocates instances threw a 500 when it failed to get that AZs capacity instead of just allocating the instances to the other 2 AZs.

source
- Dave@lemmy.nz ⁨9⁩ ⁨months⁩ ago
  I get how it’s possible, but this is Google. Surely they have decades of experience at keeping a website up no matter what happens!
  
  source
  - Evil_incarnate@lemm.ee ⁨9⁩ ⁨months⁩ ago
    Companies are made up of people. Companies save money by firing the most expensive people, the most experienced. The ones left have a lot less experience.
    
    source
  - 31ank@ani.social ⁨9⁩ ⁨months⁩ ago
    You could also say its AWS. They also should have zhe experience, but mistakes happen
    
    source
Orygin@sh.itjust.works ⁨9⁩ ⁨months⁩ ago
From the incident report it seems the impact was limited to VMs in one DC in one region to be stopped, as the power was lost. And some service degradation in the region.
So not that much impact. Of course resources in this DC would stop working, but the rest of the region was still working properly. If you built your infra in this region in a resilient manner, your services should not have been impacted that much

source