Comment on A single point of failure triggered the Amazon outage affecting millions
nova_ad_vitum@lemmy.ca 6 days agoWe don’t have to. It is entirely possible to engineer applications and services in a way that they’re not dependent on any one cloud service, while also using cloud services for IaaS. Netflix famously does this, and sure enough Netflix experience no service interruptions during this latest outage despite having a large AWS presence.
DudeImMacGyver@kbin.earth 6 days ago
If we want a truly robust system, yeah, we kinda do. This sort of event is only one of the issues with allowing a single entity to control pretty much everything.
There are plenty of potential issues from a corrupt rogue corporation hijacking everything to attacks to internal fuck-ups like we just experienced. Sure, they can design a better cloud, but at the end of the day, it's still their cloud. The Internet needs to be less centralized, not more (and I don't just mean that purely in terms of infrastructure, though that is included of course).
nova_ad_vitum@lemmy.ca 6 days ago
What I’m advocating for is the opposite of “allowing one entity to control everything”.
en.wikipedia.org/wiki/Chaos_engineering#Chaos_Mon…
Read about it dude. Netflix has a large presence in all major cloud providers (and they have their own data centers), but has a service whose uptime is NOT dependent on any one of those hosting environments. The proof is the pudding - Netflix service did not go down in the recent AWS outage, nor in the last one.
DudeImMacGyver@kbin.earth 5 days ago
Yes, Netflix had their own infrastucture in addition to other multiple redundant cloud services for their CDNs: You're kind of proving my point?
nova_ad_vitum@lemmy.ca 5 days ago
How? Their reliability would exist without that. There’s nothing inherent to their own data center that makes their setup that much better. Having a distributed system across multiple cloud service providers means your actual chance of downtime (here I mean inverse of uptime) is their individual chances of uptime multiplied by each other. In other words, they all have to go down for your service to fail. The catch is you have to use only commodity IaaS and PaaS, nothing proprietary to one CSP.
For smaller companies especially, in terms of pure reliability, there’s no reason to think that they would be better at running a high availability data center than Microsoft or AWS or Google.
Parallel distributed architectures give you the advantages of using public cloud (not having to physically manage your own data center) without the disadvantages (dependence on any one cloud vendor), while also potentially increasing your reliability beyond the reliability of any one of your cloud vendors .