Comment

Comment on Incremental backups to optical media: tar, dar, or something else?

nibbler@discuss.tchncs.de ⁨4⁩ ⁨months⁩ ago

your first two points can be mitigated by using checksums. trivial to name the file after it’s checksum, but ugly. save checksums separately? safe checksums in file metadata (exit)? this can be a bit tricky 🤣 I believe zfs already has the checksum, so the job would be to just compare lists.

restoring is as easy, creation gets more complicated and thus prone to errors

source

Sort:hotnew top

traches@sh.itjust.works ⁨4⁩ ⁨months⁩ ago
I’ve been thinking through how I’d write this. With so many files it’s probably worth using sqlite, and then I can match them up by joining on the hash. Deletions and new files can be found with different join conditions. I found a tool called ‘hashdeep’ that can checksum everything, though for incremental runs I’ll probably skip hashing if the size, times, and filename haven’t changed. I’m thinking nushell for the plumbing? It runs everywhere, though they have breaking changes frequently. Maybe rust?

ZFS checksums are done at the block level, and after compression and encryption. I don’t think they’re meant for this purpose.

source
- nibbler@discuss.tchncs.de ⁨4⁩ ⁨months⁩ ago
  never heard of nushell, but sounds interesting… but it’s not default anyhwhere yet. I’d go for bash, perl or maybe python? your comments on zfs make a lot of sense, and invalidate my respective thoughts :D
  
  source
  - traches@sh.itjust.works ⁨4⁩ ⁨months⁩ ago
    I only looked how zfs tracks checksums because of your suggestion! Hashing 2TB will take a minute, would be nice to avoid.
    
    Nushell is neat, I’m using it as my login shell. Good for this kind of data-wrangling but also a pre-1.0 moving target.
    
    source