Comment on I hate Clouds - a personal perspective on why I think Clouds suck
loudwhisper@infosec.pub 4 months agoComplexity brings fragility. It’s not about doing the job right, is that “right” means having to deal with a level of complexity, a so high number of moving parts and configuration options, that the bar is set very high.
Also, I would argue that a large number of organizations don’t actually need the resilience that they pay a very high price for.
Tja@programming.dev 4 months ago
Complexity in this case should bring redundancy, not fragility. You are adding components in parallel, not in series, thus reducing fragility.
A raid 5 is more complex than a single drive, but it’s less fragile.
loudwhisper@infosec.pub 4 months ago
I wish it worked like that, but I donct think it does. Connecting clouds means introducing many complex problems. Data synchronization and avoiding split-brain scenarios, a network setup way more complex, stateful storage that needs to take into account all the quirks and peculiarities of all services across all clouds, service accounts and permissions that need to be granted and segregated for all of them, and way more. You may gain resilience in some areas, but you introduce a lot more things that can fail, be misconfigured or compromised.
Plus, a complex setup makes it harder by definition to identify SPOFs, especially considering it’s very likely nobody in the workforce is going to be an expert in all the clouds in use.
To keep using your simile of the disks, a single disk with a backup might be a better solution for many people, considering you otherwise might need a RAID controller that can fail and all the knowledge to handle and manage a RAID array properly, in addition to paying 4 or 5 times the storage. Obviously this is just to make a point, I don’t actually think that RAID 5 vs JBOD introduces comparable complexity compared to what multi-cloud architecture does to single-cloud.
Tja@programming.dev 4 months ago
Split brain are easily solved, there’s of the shelf solutions and if you have some custom code you can use plenty of well researched solutions, for instance raft. Putting bizantine fault in Google scholar yields thousands of papers,if you want something fancier.
Same for most problems you mentioned, they were an issue 10 years ago, nowadays you can federate, abstract or outsource most of it.
Making it harder to identify SPFOs doesn’t increase fragility. If you whole system a single instance it’s trivial to identify (the whole thing) but very brittle.
loudwhisper@infosec.pub 4 months ago
Of course the problem is solved, but that doesn’t mean that the solution is easy. Also, distributed protocols still need to work on top of a complicated network and with real-life constraints in terms of performances (to list a few). A bug, misconfiguration, oversight and you have a problem.
Just to make an example, I remember a Kafka cluster with 5 replicas completely shitting its pants for 6h to rebalance data during a planned maintenance where one node was brought offline. It caused one of the longest outages to date with the websites which relied on it offline. Was it our fault? Was it a misconfiguration? A bug? It doesn’t matter, it’s a complex system which was implemented and probably something was missed.
Technology is implemented by people, complexity increased the chances of mistakes, not sure this can be argued.
Making it harder to identify SPOF means you might miss your SPOF, and that means having liabilities, and having anyway scenarios where your system can crash, in addition for paying quite a lot to build a resilience that you don’t achieve.
A single instance with 2 failure scenarios (disk failure and network failure) - to make an example - is not more fragile than a distributed system with 20 failure scenarios. Failure scenarios and SPOF can have compensating controls and be mitigated successfully. A complex system where these can’t be fully identified can’t have compensating control and residual risk might be much harder. So yes, a single disk can fail more likely than 3 disks at once, but this doesn’t give the whole picture.