Comment on CrowdStrike downtime apparently caused by update that replaced a file with 42kb of zeroes
CeeBee_Eh@lemmy.world 5 months agoThat’s a bad analogy. CrowdStrike’s driver encountering an error isn’t the same as not having disk IO or a memory corruption. If CrowdStrike’s driver didn’t load at all the system could still boot.
It should absolutely be expected that if the CrowdStrike driver itself encounters an error, there should be a process that allows the system to gracefully recover. The issue is that CrowdStrike likely thought of their code as not being able to crash as they likely only ever tested with good configs, and thus never considered a graceful failure of their driver.
5C5C5C@programming.dev 5 months ago
I don’t doubt that in this case it’s both silly and unacceptable that their driver was having this catastrophic failure, and it was probably caused by systemic failure at the company, likely driven by hubris and/or cost-cutting measures.
Although I wouldn’t take it as a given that the system should be allowed to continue if the anti-virus doesn’t load properly more generally.
For an enterprise business system, it’s entirely plausible that if a crucial anti-virus driver can’t load properly then the system itself may be compromised by malware, or at the very least the system may be unacceptably vulnerable to malware if it’s allowed to finish booting. At that point the risk of harm that may come from allowing the system to continue booting could outweigh the cost of demanding manual intervention.
In this specific case, given the scale and fallout of the failure, it probably would’ve been preferable to let the system continue booting to a point where it could receive a new update, but all I’m saying is that I’m not surprised more generally that an OS just goes ahead and treats an anti-virus driver failure at BSOD worthy.