Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is another cool example of a toy database that is again very small:

> The database size for one warehouse is approximately 100 MB (we experiment with five warehouses for a total size of 500MB).

It is not surprising that when your database basically fits in RAM, serializing on one writer is worth doing, because it just plainly reduces contention. You basically gain nothing in a DB engine from multi-writer transactions when this is the case. A large part of a write (the vast majority of write latency) in many systems with a large database comes from reading the index up to the point where you plan to write. If that tree is in RAM, there is no work here, and you instead incur overhead on consistency of that tree by having multiple writers.

I'm not suggesting that these results are useless. They are useful for people whose databases are small because they are meaningfully better than RocksDB/LevelDB which implicitly assume that your database is a *lot* bigger than RAM.



> RocksDB/LevelDB which implicitly assume that your database is a lot bigger than RAM.

Where are you getting that assumption from? LevelDB was built to be used in Google Chrome, not for multi-TB DBs. RocksDB was optimized specifically for in-memory workloads.


I worked with the Bigtable folks at Google. LevelDB's design is ripped straight from BigTable, which was designed with that assumption in mind. I'm also pretty sure it was not designed specifically for Google Chrome's use case - it was written to be a general key-value storage engine based on BigTable, and Google Chrome was the first customer.

RocksDB is Facebook's offshoot of LevelDB, basically keeping the core architecture of the storage engine (but multithreading it), and is used internally at Facebook as the backing store for many of their database systems. I have never heard from anyone that RocksDB was optimized for in-memory workloads at all, and I think most benchmarks can conclusively say the opposite: both of those DB engines are pretty bad for workloads that fit in memory.


I think we've gone off on a tangent. At any rate, both LevelDB and RocksDB are still single-writer so whatever point seems to have been lost along the way.


I've used RocksDB for an in-memory K/V store of ~600GB in size and it worked really well. Not saying it's the best choice out there but it did the job very well for us. And in particular because our dataset was always growing and we needed the option to fallback to disk if needed, RocksDB worked very well.

Was a PITA to optimise though; tons of options and little insight into which ones work.


I am using the same rough model, and I'm using that on a 1.5 TB db running on Raspberry PI very successfully.

Pretty much all storage libraries written in the past couple of decades are using single writer. Note that single writer doesn't mean single transaction. Merging transactions is easy and highly profitable, after all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: