The real pro tip is to segregate the core system and anything on your system that eats up disk space into separate partitions, along with alerting, log rotation, etc. And also to not have a single point of failure in general. Hard to say exact what went wrong w/ Toyota but they probably could have planned better for it in a general way.
Comment on All of Japan's Toyota Assembly Plants Shut Down for a Day Because Their Server Ran Out of Disk Space
Semi-Hemi-Demigod@kbin.social 1 year ago
Sysadmin pro tip: Keep a 1-10GB file of random data named DELETEME on your data drives. Then if this happens you can get some quick breathing room to fix things.
Also, set up alerts for disk space.
dx1@lemmy.world 1 year ago
Maximilious@kbin.social 1 year ago
10GB is nothing in an enterprise datastore housing PBs of data. 10GB is nothing for my 40TB homelab!
Semi-Hemi-Demigod@kbin.social 1 year ago
It not going to bring the service online, but it will prevent a full disk from letting you do other things. In some cases SSH won’t work with a full disk.
GhostlyPixel@lemmy.world 1 year ago
It’s all fun and games until tab autocomplete stops working because of disk space
TrenchcoatFullofBats@belfry.rip 1 year ago
The real apocalypse
model_tar_gz@lemmy.world 1 year ago
Tab complete in vim go lolllllooolol NO
idunnololz@lemmy.world 1 year ago
It’s nothing for my homework folder.
mohammed_alibi@lemmy.world 1 year ago
That’s an incredible collection of homework!
Lem453@lemmy.ca 1 year ago
Even better, cron job every 5 mins and if total remaining space falls to 5 mins auto deleting the file and send a message to sys admin
Semi-Hemi-Demigod@kbin.social 1 year ago
Sends a message and gets the services ready for potential shutdown. Or implements a rate limit to keep the service available but degraded.
bug@lemmy.one 1 year ago
At that point just set the limit a few gig higher and don’t have the decoy file at all
gazter@aussie.zone 1 year ago
Also, if space starts decreasing much more rapidly than normal.
z00s@lemmy.world 1 year ago
Or make the file a little larger and wait until you’re up for a promotion…
mkhopper@lemmy.world 1 year ago
500Gb maybe.
Dkarma@lemmy.world 1 year ago
The answer here is not storage it is better alerting.
nickhammes@lemmy.world 1 year ago
Why not both? Alerting to find issues quickly, a bit of extra storage so you have more options available in case of an outage, and maybe some redundancy for good measure.
RupeThereItIs@lemmy.world 1 year ago
A system this critical is on a SAN, if you’re properly alerting adding a bit more storage space is a 5 minute task.
It should also have a DR solution, yes.
nightwatch_admin@feddit.nl 1 year ago
A system this critical is on a hypervisor with tight storage “because deduplication” (I’m not making this up).
Agent641@lemmy.world 1 year ago
Yes, alert me when disk space is about to run out so I can ask for a massive raise and quit my job when they dont give it to me.
Then when TSHTF they pay me to come back.
looz@sopuli.xyz 1 year ago
There’s cases where disk fills up quicker than one can reasonably react, even if alerts are in place. And sometimes culprit is something you can’t just go and kill.
WhiskyTangoFoxtrot@lemmy.world 1 year ago
That’s what the Yakuza is for.
afraid_of_zombies@lemmy.world 1 year ago
Had an issue like that a few years back. A stand alone device that was filling up quickly. The poorly designed device could only be flushed via USB sticks. I told them that they had to do it weekly. Guess what they didn’t do. Looking back I should have made it alarm and flash once a week on a timer.