Comment on How to auto-reboot if CPU load too high?
agent_flounder@lemmy.world 8 months ago
Load average of 400???
You could install systat (or similar) and use output from sar to watch for thresholds and reboot if exceeded.
The upside of doing this is you may also be able to narrow down what is going on, exactly, when this happens, since sar records stats for CPU, memory, disk etc. So you can go back after the fact and you might be able to see if it is just a CPU thing or more than that. (Unless the problem happens instantly rather than gradually increasing).
PlutoniumAcid@lemmy.world 8 months ago
Thank you for these ideas, I will read up on systat+sar and give it a go.
Also smart to have the script always running, sleeping, rather than launching it at intervals.
I know all of this is a poor hack, and I must address the cause - but so far I have no clues what’s causing it. I’m running a bunch of Docker containers so it is very likely one of them painting itself into a corner, but after a reboot there’s nothing to see, so I am now starting with logging the top process. Your ideas might work better.