Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This sort of thing scares me. It's why I started running consistency checks on my important archives (like my photo library), which I keep backed up in multiple places. We tend to think that in a digital world bits are just bits and do not get corrupted — which is decidedly untrue.

I wrote my own consistency checker, as I wasn't happy with what was out there. I wanted it to be simple, and maintainable in the long term (>10 years horizon). See https://github.com/jwr/ccheck if you need something like this. I now update my checksums regularly and check for corruption.



It used to scare me, too. Then I changed my attitude to the one presented by Stanley Kubrick's famous movie character and stopped worrying. What was the worst thing that would happen if I lost it all? It turns out, for most files, it wouldn't matter much. So basically I identified just a couple of crucial files and I keep their encrypted copies everywhere. As for the rest, I don't care that much, three copies (on my main machine, external drive, and Hetzner) are enough.


I do care very much about my family photo archive. I have photos on paper and glass that are over 120 years old, and I'm afraid the digitalization made us care not nearly enough about longevity of our data.


I take yearly backups on to Blu-Ray M-DISC discs. They're ceramic instead of organic so they're not susceptible to the same kind of oxidation issues most regular optical media has. I make a few copies of the last couple of years of important documents and images (I get some overlap) and store those at a few different locations. Usually other family members I trust.

The amount of important documents and images that I really care about is only a small percentage of all my data. Most of the two year combinations fit on a single 25GB disc but its not terribly expensive to get a 50GB or 100GB disc if needed.

Of course, there's a chance I won't be able to find a Blu-Ray reader in 20-30 years but I imagine there will be some other way to transfer over this dataset when that time comes.

As to the durability of these M-DISCs, I have three 25GB discs I've been testing durability of over the last several years. One sits somewhere outside, often somewhere around the patio table or on a cart under some shade unprotected. Another kicks around on my desk unprotected and gets moved around a lot. Finally a third one sits in the same disc case with the actual data I'm trying to preserve. All of the discs have the exact same data. Every now and then I compare them. The outside one has definitely had a bit of corruption but is still mostly readable. The desk one has a couple of files that it does some retries on (the seek time to the file is higher than expected) but has no corruption. The one in the sleeve is practically perfect after several years.


Photos from 120 years ago survive, but photos from 50 years ago are already fading. Color photos tend to not last unless they were printed just right (I have no idea how to know which chemistry was right - though by now everyone knows).


How many of those paper photos survived vs how many were not made in the first place thinking you could not store the money that well in the first place?


> We tend to think that in a digital world bits are just bits and do not get corrupted — which is decidedly untrue.

That it’s not true is pretty much the reason why ZFS was created, though lots of people still don’t want to hear it, including companies (APFS only cows and checksums metadata for instance).


Agreed. I’m researching changing my unraid server over to ZFS as soon as possible. Looks very doable although my hope is that the developers support it out-of-the-box sometime soon.


Still, using rsync with checksumming enable can get your Photos library messed up if you sync (or just copy) to a non APF or HFS+ file system. Be very careful.


How is that possible?


I read that some of the key information is in attributes unique to Apple's file systems, I can't find the source but that was my conclusion after trying to figure out what went wrong in the Time-Machine/Synology/Rsync-via-webdav system I set up for my father some years ago.

If you rsync ext4 files to exfat, you also have issues, but those are very clearly reported when you attempt to do so.


It makes sense that you loose some information from a posix filesystem to exfat, but from HFS+ to lets say XFS? Both can have extended attributes..hmm

However hold on, i do some research. Now you hooked me ;)

EDIT: Ok got it, both Filesystem NEED xattrs enabled, and you need to run rsync with the -X or --xattrs option, that should fix the problems.


Extended attributes are messy. E.g. the equivalent on Windows (ADS, not EA, which also exist separately) can store arbitrary amounts of data, I'd expect resource forks on macOS to be similar. Meanwhile Linux only supports 64 KB of data for them, though most Linux-native file systems have an even lower one block limit. And then there's Solaris where extended attributes are actually an FS namespace.

So even if both file systems support extended attributes, does not mean that you can actually preserve them.


>Meanwhile Linux only supports 64 KB

Yes, absolutely true, thanks for the addition:

-Linux max 64kb (some filesystems just 4k)

-HFS+ max 128kb, But MacOS theoretically supports up to 64MB!

-FreeBSD max 64MB with UFS2 or ZFS

-Windows 65536 bytes with NTFS

So from BSD (MacOS) to BSD should/could work.

Truly messy stuff :)


So build your own NAS on BSD+ZFS and use rsync w/xattr's? Not a bad solution theoretically...


Keep in mind that your mounted NFS needs to support xattrs too :)

mount -t nfs -o user_xattr

I just found out that ~only? the Solaris/illumOS NFS-server supports that, and the Linux Client since Kernel 5.9

With all that information..i honestly would just make a dump of my FS and backup the dump, terrible i know.


This is the advantage of the "newfangled" backup tools which don't just copy files from A to B and can do (but not always do!) a much better job at a) not caring about your backup destination's file system b) not losing data that's more complicated than "name and contents".


I was hoping to hook someone, looking forward to what you find :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: