This is pretty awesome! But I'm not a fan of the RNG involved. Is there an appro...

Hnrobert42 · on June 24, 2021

I was barely hanging on to a lot of the math notation, but the author argues the advantage of the randomization is that on average the performance will be constant across all datasets. That is, once you develop a sense for how the system performs, you shouldn’t be surprised by wild performance swings.

If that is not a valuable feature for you, the I believe you could use any clustering hash function you want, including a deterministic one.

bruce343434 · on June 24, 2021

The randomization is to prevent bias in the sampling, as they illustrate with the grid. They then overlay random "grid shapes" to roughly approximate a sphere. I can't prove it right now, but I'm betting that if you extended this approach to infinite hashes you would eventually get a perfect circular fallof.

So, the randomization is not the only way to prevent bias: you can also just not have it to begin with by cutting to the chase and using circular falloff to begin with. But what I'm asking is how would one implement such a thing.