Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Swapping symlinks is possible. Using a database (sqlite specifically) has other benefits, like being able to do deduplication, backups are easier, compressed content can be stored, content hash can be stored etc.


Sqlite isn't necessarily easier to backup than a filesystem. I've got a fairly large (~60GB) sqlite that I'm somewhat eager to get off of. If I'd stuck with pure filesystem then backing up only the changeset would be trivial, but with sqlite I have to store a new copy of the database.

I've tried various solutions like litestream and even xdelta3 (which generates patches in excess of the actual changeset size), but I haven't found a solution I'm confident in other than backing up complete snapshots.


You might like the new sqlite3_rsync command https://www.sqlite.org/rsync.html


Yeah that looks ideal for this exact problem, because it lets you stream a snapshot backup of a SQLite over SSH without needing to first create a duplicate copy using .backup or vacuum. My notes here: https://til.simonwillison.net/sqlite/compile-sqlite3-rsync


Maybe that tool just doesn't fit my use-case, but I'm not sure how you'd use it to do incremental backups? I store all of my backups in S3 Glacier for the cheap storage, so there's nothing for me to rsync onto.

I can see how you'd use it for replication though.


If you want incremental backups to S3 I recommend Litestream.


What do you do about compaction?


zstd, though that only shaves off a few GB total.


Oh, I mean like with the vacuum command. As the databases get larger it can become unwieldy.


I just tried it, it only recovered a few MB


Aha, so not much need. I've always avoided SQLite for larger databases due to the extra space needed to allow compaction, maybe not a real problem in most applications though.


You could also employ a different filesystem like ZFS or btrfs in tandem with the symlink-swapping strategy to achieve things like deduplication. Or, once you have deduplication at the filesystem level, just construct a new complete duplicate of the folder to represent the new version and use renaming to swap the old for the new, and poof -- atomic changes and versioning with de-duplication, all while continuing to be able to use standard filesystem paradigms and tools.


Deduplication can be achieved the same way as in sqlite, by keeping files indexed by sha256. There are also filesystems who provide transparent compression.

Seeing as you need some kind of layer between web and sqlite, you might as well keep a layer between web and FS who nets you most or all of the benefits.


All of this is easily done on a filesystem too. I would assume this is a performance tradeoff rather than features?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: