Swapping symlinks is possible. Using a database (sqlite specifically) has other ...

webstrand · on Oct 27, 2024

Sqlite isn't necessarily easier to backup than a filesystem. I've got a fairly large (~60GB) sqlite that I'm somewhat eager to get off of. If I'd stuck with pure filesystem then backing up only the changeset would be trivial, but with sqlite I have to store a new copy of the database.

I've tried various solutions like litestream and even xdelta3 (which generates patches in excess of the actual changeset size), but I haven't found a solution I'm confident in other than backing up complete snapshots.

fanf2 · on Oct 27, 2024

You might like the new sqlite3_rsync command https://www.sqlite.org/rsync.html

simonw · on Oct 27, 2024

Yeah that looks ideal for this exact problem, because it lets you stream a snapshot backup of a SQLite over SSH without needing to first create a duplicate copy using .backup or vacuum. My notes here: https://til.simonwillison.net/sqlite/compile-sqlite3-rsync

webstrand · on Oct 27, 2024

Maybe that tool just doesn't fit my use-case, but I'm not sure how you'd use it to do incremental backups? I store all of my backups in S3 Glacier for the cheap storage, so there's nothing for me to rsync onto.

I can see how you'd use it for replication though.

simonw · on Oct 27, 2024

If you want incremental backups to S3 I recommend Litestream.

hedgehog · on Oct 27, 2024

What do you do about compaction?

webstrand · on Oct 28, 2024

zstd, though that only shaves off a few GB total.

hedgehog · on Oct 28, 2024

Oh, I mean like with the vacuum command. As the databases get larger it can become unwieldy.

webstrand · on Oct 29, 2024

I just tried it, it only recovered a few MB

hedgehog · on Oct 29, 2024

Aha, so not much need. I've always avoided SQLite for larger databases due to the extra space needed to allow compaction, maybe not a real problem in most applications though.

theturtle32 · on Oct 27, 2024

You could also employ a different filesystem like ZFS or btrfs in tandem with the symlink-swapping strategy to achieve things like deduplication. Or, once you have deduplication at the filesystem level, just construct a new complete duplicate of the folder to represent the new version and use renaming to swap the old for the new, and poof -- atomic changes and versioning with de-duplication, all while continuing to be able to use standard filesystem paradigms and tools.

borsecplata · on Oct 27, 2024

Deduplication can be achieved the same way as in sqlite, by keeping files indexed by sha256. There are also filesystems who provide transparent compression.

Seeing as you need some kind of layer between web and sqlite, you might as well keep a layer between web and FS who nets you most or all of the benefits.

warble · on Oct 27, 2024

All of this is easily done on a filesystem too. I would assume this is a performance tradeoff rather than features?