IMHO I think databases could be added to the list.
It’s one of the most complex system one can develop and you end up learning about multiple areas, such as OS, compilers, distributed systems, data structures, parallelism etc.
The problem with writing a database, and maybe a few other of these, is that given the enormous compute/io capabilities of a modern machine, its quite possible to implement it completely wrong and never really know.
AKA, a toy database might be enough to handle some simple storage/retrieval problems but be full of hidden O(n^2) or higher logic which would fall down hard with even fairly simple usage in the "real world".
Reminds me of my own text editor, written in Applesoft basic when I was in middle school. It worked for its intended purpose (editing small assembly files), but was really quite terrible all things considered. I remember it being quite slow to save/restore, and it was only really capable of editing files of a few hundred lines before it started breaking BASICs memory allocation schemes. AKA, I didn't really learn any of the datastructure finesse needed to implement a "real" text editor with line wrap/etc.
Worse I remember trying to read the code a few years later, and while it fit on two printed pages, it was 100% unreadable.
(for those that don't know, applesoft's speed was influenced by "formatting" if you will. It encouraged line number usage only really for control flow, plus the long list of call/peek/poke magic numbers required a handy cheat sheet of what each address did)
> It’s one of the most complex system one can develop
The same could probably be said about the internal combustion engine, but it might soon be replaced by electric batteries, which provide a much more elegant solution.
I believe that "unbundled" databases, such as Crux[1], can become the electric batteries of the database world by making a lot of the current complexity irrelevant.
1) the single-writer principle of the transaction log means there's no need for any transactional locking
2) the separation of reads and writes allows for elegant horizontal read-scaling without coordination/consensus
3) pluggable storage backends implemented as simple Clojure protocols (as the sibling comment mentions), which eliminates a large number of performance and durability concerns
4) combining schema-on-read with entity-attribute-value indexing means there's no need to interpret and support a user-defined schema
5) Datalog is simpler to implement and use than the full SQL standard or alternative graph query languages
SQL certainly provides a lot of bells and whistles but Crux has the advantage of consistent in-process queries (i.e. the "database as a value") which means you can combine custom code with multiple queries efficiently to achieve a much larger range of possibilities, such as graph algorithms like Bidirectional BFS [1].
I'm doing a relational lang(http://tablam.org), that could be considered to be a in memory kind of db.
Is certainly challenging.
Just look at joins. You have (at least) 2 nested loop joins algos, then sorted and hash joins, and then you have cross and left, right, inner and outers. All of them with small subtle tricks to make it performant (in theory: You can build all on top of CROSS. But! That will be very wastefull very fast!)
The caveat there is you can write up a fairly simple nosql database in an afternoon. What I like about the other projects is that the barrier to even a minimal thing is a bit higher. I think that leads to more opportunity to work your creative muscles.
Though if you add some constraints like it must have jdbc compatability then that becomes interesting.
It’s one of the most complex system one can develop and you end up learning about multiple areas, such as OS, compilers, distributed systems, data structures, parallelism etc.