Comment on Hardware Watchdogs & Auto Reboots in Proxmox
non_burglar@lemmy.world 1 week ago
Nice.
Other actions are possible with watchdog timers, especially with hypervisors. They can invoke a script or use an agent to kill a misbehaving process.
Ultimately, the best solution is not to need the timers at all, so finding the culprit within the client is ideal, though not always possible.
VMs hanging on memory often have incorrect caching policies, you may want to investigate that.
starkzarn@infosec.pub 1 week ago
You’re absolutely right! I’d point you back to my notion of cost-benefit analysis. Anything more than the 20min that I’ve spent on analysis so far isn’t worth my time. If the VM falls over permanently, that was a risk and my time savings has already been worth that risk. If I were looking at something like a production file server or domain controller, sure – I’d spend more time on it. Likely though, I’d just have engineered it better in the first place. Not every problem warrants a high precision solution. 🙂