During a centralisation of public school local servers to a data centre, I creat...

_e4c8 · on Nov 18, 2022

Cool story! what languages, frameworks, etc did you use? Or are you about to tell me COBOL? :P

jiggawatts · on Nov 18, 2022

The legacy back-end system being migrated was Clipper + dBase III on DOS, which is reminiscent of COBOL.

The part I added was built with ASP.NET 2.0 on top of Microsoft SQL Server 2005, and was eventually upgraded to 4.0 and 2008 respectively.

The only magic sauce was the use of SQLCLR to embed a few small snippets of C# code into the SQL Server database engine. This allowed the full-text indexing to be specialised for the high level data partitioning. Without this, searches would have taken up to ten seconds. With this custom search the p90 response time was about 15 milliseconds! I believe PostgreSQL is the only other popular database engine out there that allows this level of fine-tuned custom indexing.

ZephyrBlu · on Nov 19, 2022

p90 for a full-text search on 330 million documents was 15ms?

I know you can tune the hell out of search performance, but that seems a bit too insane for what looks like a relatively unspecialized setup (Standard DB).

tgtweak · on Nov 19, 2022

Not likely the full book, just title, author and a few other low cardinality fields I'm sure. Also not likely 330 million unique volumes, but total books. This is within reach of a single database with proper indexing.

itsthecourier · on Nov 19, 2022

Can you elaborate a little bit more about how you partitioned it?

jiggawatts · on Nov 19, 2022

I simply added the "library id" as a prefix to almost every table's primary key. Every lookup specified it with an equality filter, so essentially it was thousands of standalone libraries in a single schema.

One hiccup was that when the query cardinality estimator got confused, it would occasionally ignore the partition prefix and do a full scan somewhere, bloating the results by a factor of 2000x! This would cause dramatic slowdowns randomly, and then the DB engine would often cache the inefficient query plan, making things slow until it got rebooted.

This is a very deep rabbit hole to go down. For example, many large cloud vendors have an Iron Rule that relational databases must never be used, because they're concerned precisely about this issue occurring, except at a vastly greater scale.

I could have used actual database partitioning, but I discovered it had undesirable side-effects for some cross-library queries. However, for typical queries this would have "enforced" the use of the partitioning key, side-stepping the problem the cloud vendors have.

Modern versions of SQL Server have all sorts of features to correct or avoid inefficient query plans. E.g.: Query Store can track the "good" and "bad" version of each plan for a query and then after sufficient samples start enforcing the good one. That would have been useful back in 2007 but wasn't available, so I spent about a month doing the same thing but by hand.

ZephyrBlu · on Nov 19, 2022

This makes the performance a lot more understandable if you're only searching in a single library. I assume that cuts out >99.9% of those 330 million documents.

greenie_beans · on Nov 19, 2022

is this symphony or horizon or spydus or koha? or?

jeffal · on Nov 22, 2022

All those commercial systems existed before his (going by his use of SQL 2005).

greenie_beans · on Nov 23, 2022

ah. i was just listing the ones that i could think of that use the concept of a library id.

joshxyz · on Nov 21, 2022

> hardly anyone in the department of education even knows my name!

it's okay sir, we now know you as jiggawatts

silasb · on Nov 18, 2022

... and this is how OCLC was created?

sideshowb · on Nov 18, 2022

> had a far greater effect on the literacy of our state than any educator by orders of magnitude

Nice work, but check your ego mate. Seems your growth hacking would have had zero result if those kids couldn't read to start with, so you could share some credit ;-)

dang · on Nov 19, 2022

"Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."

https://news.ycombinator.com/newsguidelines.html

silasdavis · on Nov 18, 2022

Maybe it wasn't meant that way. If they hadn't had been there then somone else would have been. You can be on the crest of a wave and not be responsible for its power.

hsbauauvhabzb · on Nov 19, 2022

By that logic, the people who farm the trees that make the books have more impact than anyone before them, unless you want to consider the people that make the tools, or feed the farmers, etc etc.

domlebo70 · on Nov 18, 2022

Any more info? This is fascinating.

nicbou · on Nov 18, 2022

> This was the project that made realise how huge the effort-to-effect ratio that can be when computers are involved

I love Steve Jobs' metaphor: computers as a bicycle of the mind [0]. Unfortunately, a lot of effort is concentrated on problems that scale to billions of people. There's a lack of attention to problems that would have a big effect for a relatively small number of people. It's a shame, because they're a blast to work on.

[0] https://www.youtube.com/watch?v=L40B08nWoMk

OkayPhysicist · on Nov 18, 2022

They really are. I think the most rewarding software I ever wrote was my first paid gig, where I automated lap swim scheduling for my local swim club. Took me maybe an hour, got paid more money than I'd make in two days as a lifeguard, and they were thanking ME for it. Turned out I had saved a volunteer upwards of an hour every week. With a shitty little JavaScript program.

triyambakam · on Nov 19, 2022

The first time I heard that metaphor, I thought that he meant it in the way that bicycles are really fun to ride. I agree with both interpretations.

2muchcoffeeman · on Nov 19, 2022

I have a bunch of small scale ideas that I want to implement. Not necessarily for profit. Any ideas on how to execute?

nicbou · on Nov 19, 2022

Pick one that scratches one of your itches, and get started. Release early, keep iterations small. It's easier to keep working on something you actively use.