IMHO I think databases could be added to the list. It’s one of the most complex ...

StillBored · on Dec 15, 2019

The problem with writing a database, and maybe a few other of these, is that given the enormous compute/io capabilities of a modern machine, its quite possible to implement it completely wrong and never really know.

AKA, a toy database might be enough to handle some simple storage/retrieval problems but be full of hidden O(n^2) or higher logic which would fall down hard with even fairly simple usage in the "real world".

Reminds me of my own text editor, written in Applesoft basic when I was in middle school. It worked for its intended purpose (editing small assembly files), but was really quite terrible all things considered. I remember it being quite slow to save/restore, and it was only really capable of editing files of a few hundred lines before it started breaking BASICs memory allocation schemes. AKA, I didn't really learn any of the datastructure finesse needed to implement a "real" text editor with line wrap/etc.

Worse I remember trying to read the code a few years later, and while it fit on two printed pages, it was 100% unreadable.

(for those that don't know, applesoft's speed was influenced by "formatting" if you will. It encouraged line number usage only really for control flow, plus the long list of call/peek/poke magic numbers required a handy cheat sheet of what each address did)

krn · on Dec 14, 2019

> It’s one of the most complex system one can develop

The same could probably be said about the internal combustion engine, but it might soon be replaced by electric batteries, which provide a much more elegant solution.

I believe that "unbundled" databases, such as Crux[1], can become the electric batteries of the database world by making a lot of the current complexity irrelevant.

[1] https://opencrux.com/

mamcx · on Dec 14, 2019

Whoa! somebody actually try to implement the log as the central component of a db!

After read https://www.confluent.io/blog/turning-the-database-inside-ou... I wondered about that. I think that make the log first class and "plug" relational tables (optionally) will make a amazing database engine. In short, you PERSIST your commands:

    POST /City ..
    PUT /City ..
    DELETE /City ..

and put listeners that decide if persist or not the commands, this allow to do:

POST /SendMail (to:...)

and have the flexibility to bundle the domain logic on top of the data logic in a single lang. This is my long term goal..

detaro · on Dec 14, 2019

What makes crux's internals less complicated than a more conventional database?

refset · on Dec 14, 2019

1) the single-writer principle of the transaction log means there's no need for any transactional locking

2) the separation of reads and writes allows for elegant horizontal read-scaling without coordination/consensus

3) pluggable storage backends implemented as simple Clojure protocols (as the sibling comment mentions), which eliminates a large number of performance and durability concerns

4) combining schema-on-read with entity-attribute-value indexing means there's no need to interpret and support a user-defined schema

5) Datalog is simpler to implement and use than the full SQL standard or alternative graph query languages

...I work on Crux :)

simplify · on Dec 15, 2019

Please tell us more about point 5!

refset · on Dec 15, 2019

Crux uses a Worst-Case Optimal Join [0] algorithm with bitemporal indexes, and the Datalog-specific query layer is implemented in less than a thousand lines of Clojure: https://github.com/juxt/crux/blob/master/crux-core/src/crux/...

SQL certainly provides a lot of bells and whistles but Crux has the advantage of consistent in-process queries (i.e. the "database as a value") which means you can combine custom code with multiple queries efficiently to achieve a much larger range of possibilities, such as graph algorithms like Bidirectional BFS [1].

[0] https://arxiv.org/pdf/1803.09930.pdf

[1] https://github.com/juxt/crux/blob/master/docs/example/imdb/s...

tincholio · on Dec 14, 2019

I suspect the gp was referring to it using either Kafka or RocksDB for storage.

pacaro · on Dec 14, 2019

This. Just implementing a persistent key value store with b-trees is a great jumping off point

This is a rabbit hole that goes as deep as you want it, which has both positives and negatives of course

mamcx · on Dec 14, 2019

I'm doing a relational lang(http://tablam.org), that could be considered to be a in memory kind of db.

Is certainly challenging.

Just look at joins. You have (at least) 2 nested loop joins algos, then sorted and hash joins, and then you have cross and left, right, inner and outers. All of them with small subtle tricks to make it performant (in theory: You can build all on top of CROSS. But! That will be very wastefull very fast!)

daxfohl · on Dec 14, 2019

The caveat there is you can write up a fairly simple nosql database in an afternoon. What I like about the other projects is that the barrier to even a minimal thing is a bit higher. I think that leads to more opportunity to work your creative muscles.

Though if you add some constraints like it must have jdbc compatability then that becomes interesting.