not OP but the hyperdex paper is interesting along these lines: https://www.cs.c...

not OP but the hyperdex paper is interesting along these lines: https://www.cs.cornell.edu/people/egs/papers/hyperdex-sigcom...

the central thesis being that actually at some point scaling tasks across multiple machines increases contention. having every request hit every machine runs that one query real fast since it's using the resources of a large number of machines, but running a lot of queries in parallel produces long queues and lots of network traffic.

in a way this is sort of what microservices do explicitly, but, you can partition the data implicitly into hyperdimensional spaces and then queries will only hit certain shards in the cluster. If there are shards that are particularly loaded up, you can increase the resources of those particular shards.

I think you could probably do the same thing in a lot of databases that use sharding, but, it does a good job of outlining the issue and the tension of one fast request vs good aggregate throughput. And this was 2012 which really was before noSQL caught fire and maybe before the maturity of some of those nosql systems around sharding/etc.