Hi all,
I am 99% sure based on the output I got on console that my new NVMe boot drive for TrueNAS is already failing, but wanted to see if any folks here that had more experienced could confirm. Select output:
[3655, 622714] nvme nvme0: Device not ready; aborting initialization. CSTS=0X0 [3655,623601] nvme nvme0: Disabling device after reset failure -19 [3655,646704] I/O error, dev nvme0n1, sector 89179528 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0 [3655,646711] zio= pool=boot-pool vdev=dev/nvme0n1p3 error=5 type=2 offset=45119897600 size=4096 flags=3162240 […] [3691,246702] WARNING: Pool ‘boot-pool’ has encountered an uncorrectable I/O failure and has been suspended
The system would seemingly work fine after booting for a variable amount of time, and then suddenly I would no longer be able to log in through WebUI or SSH. Lesson learned about not testing NVMe drives (though I’d love any advice on the procedure for doing so. Procedure for HDDs seems to be relatively settled: SMART long test and then badblocks, but I think when I tried a SMART test on the NVMe it wasn’t supported.)
Also, if anyone has had experiences (good or bad) with NVMe manufacturers, I’d love to hear them. This was a Patriot drive, and I suspect I should spring for a Samsung Pro or EVO if I want to have a better time the next go around, but curious if my gut is right on that. Thanks so much!
frongt@lemmy.zip 1 week ago
I don’t think Patriot is terrible.
As a first step, I’d just reseat it, and make sure it’s not overheating. Also check for system BIOS and drive firmware updates.
SSD smart data can include drive health indicators, so I would check that too.
Unfortunately if it doesn’t support smart tests, the only way to test it would be a destructive write-read test.