During a centralisation of public school local servers to a data centre, I created a consolidated library enquiry system. It served over 2,000 libraries, had 330 million titles, and had about a million users. It was efficient enough to run off my laptop, if need be.
AFAIK it was one of the top five biggest library systems in the world at the time.
I was asked to add some features that would have been too difficult in the old distributed system. Things like reading competitions, recommended reading lists by age, etc…
I watched the effect of these changes — which took me mere days of effort to implement — and the combined result was that students read about a million additional books they would not have otherwise.
I’ve had a far greater effect on the literacy of our state than any educator by orders of magnitude and hardly anyone in the department of education even knows my name!
This was the project that made realise how huge the effort-to-effect ratio that can be when computers are involved…
The legacy back-end system being migrated was Clipper + dBase III on DOS, which is reminiscent of COBOL.
The part I added was built with ASP.NET 2.0 on top of Microsoft SQL Server 2005, and was eventually upgraded to 4.0 and 2008 respectively.
The only magic sauce was the use of SQLCLR to embed a few small snippets of C# code into the SQL Server database engine. This allowed the full-text indexing to be specialised for the high level data partitioning. Without this, searches would have taken up to ten seconds. With this custom search the p90 response time was about 15 milliseconds! I believe PostgreSQL is the only other popular database engine out there that allows this level of fine-tuned custom indexing.
p90 for a full-text search on 330 million documents was 15ms?
I know you can tune the hell out of search performance, but that seems a bit too insane for what looks like a relatively unspecialized setup (Standard DB).
Not likely the full book, just title, author and a few other low cardinality fields I'm sure. Also not likely 330 million unique volumes, but total books. This is within reach of a single database with proper indexing.
I simply added the "library id" as a prefix to almost every table's primary key. Every lookup specified it with an equality filter, so essentially it was thousands of standalone libraries in a single schema.
One hiccup was that when the query cardinality estimator got confused, it would occasionally ignore the partition prefix and do a full scan somewhere, bloating the results by a factor of 2000x! This would cause dramatic slowdowns randomly, and then the DB engine would often cache the inefficient query plan, making things slow until it got rebooted.
This is a very deep rabbit hole to go down. For example, many large cloud vendors have an Iron Rule that relational databases must never be used, because they're concerned precisely about this issue occurring, except at a vastly greater scale.
I could have used actual database partitioning, but I discovered it had undesirable side-effects for some cross-library queries. However, for typical queries this would have "enforced" the use of the partitioning key, side-stepping the problem the cloud vendors have.
Modern versions of SQL Server have all sorts of features to correct or avoid inefficient query plans. E.g.: Query Store can track the "good" and "bad" version of each plan for a query and then after sufficient samples start enforcing the good one. That would have been useful back in 2007 but wasn't available, so I spent about a month doing the same thing but by hand.
This makes the performance a lot more understandable if you're only searching in a single library. I assume that cuts out >99.9% of those 330 million documents.
> had a far greater effect on the literacy of our state than any educator by orders of magnitude
Nice work, but check your ego mate. Seems your growth hacking would have had zero result if those kids couldn't read to start with, so you could share some credit ;-)
Maybe it wasn't meant that way. If they hadn't had been there then somone else would have been. You can be on the crest of a wave and not be responsible for its power.
By that logic, the people who farm the trees that make the books have more impact than anyone before them, unless you want to consider the people that make the tools, or feed the farmers, etc etc.
> This was the project that made realise how huge the effort-to-effect ratio that can be when computers are involved
I love Steve Jobs' metaphor: computers as a bicycle of the mind [0]. Unfortunately, a lot of effort is concentrated on problems that scale to billions of people. There's a lack of attention to problems that would have a big effect for a relatively small number of people. It's a shame, because they're a blast to work on.
They really are. I think the most rewarding software I ever wrote was my first paid gig, where I automated lap swim scheduling for my local swim club. Took me maybe an hour, got paid more money than I'd make in two days as a lifeguard, and they were thanking ME for it. Turned out I had saved a volunteer upwards of an hour every week. With a shitty little JavaScript program.
Pick one that scratches one of your itches, and get started. Release early, keep iterations small. It's easier to keep working on something you actively use.
AFAIK it was one of the top five biggest library systems in the world at the time.
I was asked to add some features that would have been too difficult in the old distributed system. Things like reading competitions, recommended reading lists by age, etc…
I watched the effect of these changes — which took me mere days of effort to implement — and the combined result was that students read about a million additional books they would not have otherwise.
I’ve had a far greater effect on the literacy of our state than any educator by orders of magnitude and hardly anyone in the department of education even knows my name!
This was the project that made realise how huge the effort-to-effect ratio that can be when computers are involved…