Comment on I hate Clouds - a personal perspective on why I think Clouds suck

<- View Parent
loudwhisper@infosec.pub ⁨4⁩ ⁨months⁩ ago

Of course the problem is solved, but that doesn’t mean that the solution is easy. Also, distributed protocols still need to work on top of a complicated network and with real-life constraints in terms of performances (to list a few). A bug, misconfiguration, oversight and you have a problem.

Just to make an example, I remember a Kafka cluster with 5 replicas completely shitting its pants for 6h to rebalance data during a planned maintenance where one node was brought offline. It caused one of the longest outages to date with the websites which relied on it offline. Was it our fault? Was it a misconfiguration? A bug? It doesn’t matter, it’s a complex system which was implemented and probably something was missed.

Technology is implemented by people, complexity increased the chances of mistakes, not sure this can be argued.

Making it harder to identify SPOF means you might miss your SPOF, and that means having liabilities, and having anyway scenarios where your system can crash, in addition for paying quite a lot to build a resilience that you don’t achieve.

A single instance with 2 failure scenarios (disk failure and network failure) - to make an example - is not more fragile than a distributed system with 20 failure scenarios. Failure scenarios and SPOF can have compensating controls and be mitigated successfully. A complex system where these can’t be fully identified can’t have compensating control and residual risk might be much harder. So yes, a single disk can fail more likely than 3 disks at once, but this doesn’t give the whole picture.

source
Sort:hotnewtop