Hot keys, scalability, and the Zipf distribution

mjb · on Feb 8, 2023

This is the second in a series I've slowly been working on about the "laws of physics" of database scalability. The first is here: https://brooker.co.za/blog/2023/01/30/nosql.html

Obviously, scalability isn't always the most important thing, and in many cases it isn't important at all. On the other hand, there are a lot of assertions of the form "X scales", "Y does not scale", etc. I think there's a gap in accessible content that looks at whether these statements are true, and the underlying principles that make them true or false.

fizwhiz · on Feb 8, 2023

How do you reconcile this with the importance of crosstalk and contention as espoused by the USL? https://wso2.com/blog/research/scalability-modeling-using-un...

paulmd · on Feb 8, 2023

not OP but the hyperdex paper is interesting along these lines: https://www.cs.cornell.edu/people/egs/papers/hyperdex-sigcom...

the central thesis being that actually at some point scaling tasks across multiple machines increases contention. having every request hit every machine runs that one query real fast since it's using the resources of a large number of machines, but running a lot of queries in parallel produces long queues and lots of network traffic.

in a way this is sort of what microservices do explicitly, but, you can partition the data implicitly into hyperdimensional spaces and then queries will only hit certain shards in the cluster. If there are shards that are particularly loaded up, you can increase the resources of those particular shards.

I think you could probably do the same thing in a lot of databases that use sharding, but, it does a good job of outlining the issue and the tension of one fast request vs good aggregate throughput. And this was 2012 which really was before noSQL caught fire and maybe before the maturity of some of those nosql systems around sharding/etc.

mjb · on Feb 8, 2023

The universal scalability law has this term κp(p-1), which suggests that interactions are O(p^2) for systems of p components, and those interactions happen at some relative frequency κ. Right?

So why might we expect coherency to cost O(p^2)? It sure does if the data that we're trying to keep coherent is everywhere, but that's not true in typical sharded systems. It also does if we're trying to do something like atomic commitment across all nodes, but again it's not clear that's what databases actually do. It also raises the question of how we calculate κ, and how the work we're trying to get done translates into a particular value of κ.

As a conceptual model, I quite like the USL. But it doesn't seem as universal as it claims, and I haven't read anything that helps with parameter selection.

So instead we can take a step back and pick another parameter (call it ⍺), which is a random variable whose distribution is the number of shards that a database needs to coordinate over to meet its consistency and isolation goals. Then, for N requests, total work done is proportional to E[⍺] * N. Why might we believe E[⍺] is O(N) too? It could be if we're trying to be serializable and most transactions go 'everywhere'. On the other hand, with key-value accesses or weak consistency E[⍺] could be O(1). With index structures, it could be O(log N). Or whatever.

Anyway, I'm not sure that makes sense, but it does seem like the USL makes some untested assumptions about the ways systems coordinate, which makes it less useful.

xrayarx · on Feb 8, 2023

This is part of a longer series on databases and how to make them scale. Author works at amazon on databases according to his blog.

Quote about the series:

This is a bit of an introduction to a long series of posts I've been writing about what, fundamentally, it is that makes databases scale. The whole series is going to take me a long time, but hopefully there's something here folks will enjoy.

tayo42 · on Feb 9, 2023

Interesting, I think i dealt with this at a previous job.

How do you deal with it? Your not going to just have one shard handle one key if you want to keep scaling.