Comment

Comment on Incremental backups to optical media: tar, dar, or something else?

You can’t really easily locate where the last version of the file is located on an append-only media without writing the index in a footer somewhere, and even then if you’re trying to pull an older version you’d still need to traverse the whole media.

That said, you use ZFS, so you can literally just zfs send it. ZFS will already know everything that needs to be known, so it’ll be a perfect incremental. But you’d definitely need to restore the entire dataset to pull anything out of it, reapply every incremental one by one, and if just one is unreadable the whole pool is unrecoverable, but so would the tar incrementals. But it’ll be as perfect and efficient as possible, as ZFS knows the exact change set it needs to bundle up. It’s unidirectional, so that’s why you can just zfs send into a file and burn it to a CD.

Since ZFS can easily tell you the difference between two snapshots, it also wouldn’t be too hard to make a Python script that writes the full new version of changed files and catalogs what file and what version is on which disc, for a more random access pattern.

But really for Blurays I think I’d just do it the old fashioned way and classify it to fit on a disc and label it with what’s on it, and if I update it make a v2 of it on the next disc.

source

Sort:hotnew top

traches@sh.itjust.works ⁨3⁩ ⁨months⁩ ago
Ohhh boy, after so many people are suggesting I do simple files directly on the disks I went back and rethought some things. I think I’m landing on a solution that does everything and doesn’t require me to manually manage all these files:

fd (and any number of other programs) can produce lists of files that have been modified since a given date.

fpart can produce lists of files that add up to a given size.

xorrisofs can accept lists of files to add to an iso

Image

So if I fd a list of new files (or don’t for the first backup), pipe them into fpart to chunk them up, and then pass these lists into xorrisofs to create ISOs, I’ve solved almost every problem.

The disks have plain files and folders on them, no special software is needed to read them. My wife could connect a drive, pop the disk in, and the photos would be right there organized by folder.

Incremental updates can be accomplished by keeping track of whenever the last backup was.

The fpart lists are also a greppable index; I can use them to find particular files easily.

Downsides:

Change detection is naive. Just mtime. Good enough?

Renames will still produce new copies. Solution: don’t rename files. They’re organized well enough, stop messing with it.

Deletions will be disregarded.

There isn’t much rhyme or reason to how fpart splits up files. The first backup will be a bit chaotic. I don’t think I really care.

Honestly those downsides look quite tolerable given the benefits. Is there some software that will produce and track a checksum database?

Off to do some testing to make sure these things work like I think they do!
source
- nibbler@discuss.tchncs.de ⁨3⁩ ⁨months⁩ ago
  your first two points can be mitigated by using checksums. trivial to name the file after it’s checksum, but ugly. save checksums separately? safe checksums in file metadata (exit)? this can be a bit tricky 🤣 I believe zfs already has the checksum, so the job would be to just compare lists.
  
  restoring is as easy, creation gets more complicated and thus prone to errors
  
  source
  - traches@sh.itjust.works ⁨3⁩ ⁨months⁩ ago
    I’ve been thinking through how I’d write this. With so many files it’s probably worth using sqlite, and then I can match them up by joining on the hash. Deletions and new files can be found with different join conditions. I found a tool called ‘hashdeep’ that can checksum everything, though for incremental runs I’ll probably skip hashing if the size, times, and filename haven’t changed. I’m thinking nushell for the plumbing? It runs everywhere, though they have breaking changes frequently. Maybe rust?
    
    ZFS checksums are done at the block level, and after compression and encryption. I don’t think they’re meant for this purpose.
    
    source
    nibbler@discuss.tchncs.de ⁨3⁩ ⁨months⁩ ago
    never heard of nushell, but sounds interesting… but it’s not default anyhwhere yet. I’d go for bash, perl or maybe python? your comments on zfs make a lot of sense, and invalidate my respective thoughts :D
    
    source
    -> View More Comments
traches@sh.itjust.works ⁨3⁩ ⁨months⁩ ago
Woah, that’s cool! I didn’t know you just zfs send anywhere. I suppose I’d have to split it up manually with split or something to get 50gb chunks?

Dar has dar_manager which you can use to create a database of snapshots and slices that you can use to locate individual files, but honestly if I’m using this backup it’ll almost certainly be a full restore after some cataclysm.

source