[deleted]
Check dmesg
Submitted 1 week ago by ZeDoTelhado@lemmy.world to selfhosted@lemmy.world
Check dmesg
Will do after I get the chance
Are you using ZFS? I had issues a few years ago where the arc cache would compete with my VMs for RAM, and after it ran out of RAM it would just completely freeze.
Not using zfs at this point, but this sounds like a good thing to keep in mind
SpikesOtherDog@ani.social 1 week ago
Sounds like a bad drive, TBH. Not as much the platters but the electronics.
tal@lemmy.today 6 days ago
I’d suspect that too. Try just reading from the source drive or just writing to the destination drive and see which causes the problems. Could also be a corrupt filesystem; probably not a bad idea to try to
fsckit.IME, on a failing disk, you can get I/O blocking as the system retries, but it usually won’t freeze the system unless your swap partition/file is on that drive. Then, as soon as the kernel goes to pull something from swap on the failing drive, everything blocks. Might try
swapoff -abefore doing the rsync to disable swap.I’ve never had it happen, but it is possible for heat to cause issues for hard drives. I don’t know if the firmware will slow down operation to keep temperature sane — rotational drives do normally have temperature sensors, so I’d think that it would. Could try aiming a fan at the things. I doubt that that’s it, though.
ZeDoTelhado@lemmy.world 6 days ago
The reason I suspected temps was I changed very recently to a define r6 (got it second hand). And since the start I am a bit suspicious of how it performs thermally (terms of sound is actually quite OK).
I do have a fan on the drives but still one of the drives goes up to 40C still (even with front door open).
Also, when you talk about fsck, what could be good options for this to check the drive?
frongt@lemmy.zip 1 week ago
If both drives exhibit the behavior, I’d suspect the drive controller.
SpikesOtherDog@ani.social 6 days ago
True, but it’s not clear to me that both drives are exhibiting the behavior and it sounds more like a copy between two drives. I wouldn’t rule it out and do think it is a possibility, but in my professional experience drives fail much more frequently than controllers.
It makes sense to me to test the drives individually, in another system preferably, using smart long test, which is non-destructive. Next test other drives in this system. If there are errors, try changing out the SATA cables, too. If you can shuffle the data off the drives, do so and then try running them through a secure erase in another system. A bad drive should fail the same way in another system.
My other thought for probably not being the controller is that 4TB is a very long time for a sustained transfer to fail on a flakey component. Also, there are no reports of other errors.
possiblylinux127@lemmy.zip 6 days ago
That would make a lot of sense if the boot drive is using the same controller
ZeDoTelhado@lemmy.world 6 days ago
I am also inclining in this direction. I just ordered a new 8tb drive, and will proceed with smart long tests. When you tall about secure erase, are we talking using dd with /Dev/null?
SpikesOtherDog@ani.social 6 days ago
I’m suggesting either using the secure erase utility built into your efi if available or using hdparm and calling secure erase.
grok.lsu.edu/article.aspx?articleid=16716
I suggest calling these utilities with no other drives connected.