Yeah, you've pretty much got it. A common alternative to masking bits off of the...

yason · on May 29, 2012

If the hash function is any good, simply bitwise-and enough bits from the hash and use that as the index. (It's a modulo too, but simply modulo(2^x).)

Using 2's powers generally fits nicely with programming on binary computers. It also has the nice quality of producing number of slots that are always 2's powers and you can shove those naturally for example into a single memory page.

viggity · on May 29, 2012

iirc, there is a major advantage to using a prime number of slots that prevents the biasing

hythloday · on May 29, 2012

It's to protect against bad hashing algorithms:

http://www.concentric.net/~Ttwang/tech/primehash.htm

Jabbles · on May 29, 2012

Does the modulo operation add a negligible cost to these (very cheap) hashing functions?

i.e. if you're hashing a 10 character (non-unicode) word with FNV-1a you're using 10 multiplications and 10 xors. Adding a modulo (by a prime†) operation in there could feasibly double the time taken.

†Not 2...

viggity · on May 29, 2012

The modolo function is pretty much required unless you have 2^32 bytes of RAM available for each hashtable. Modolo is also ridiculously cheap, especially compared to the hashing function.

Jabbles · on May 29, 2012

Certainly, if you are hashing a large amount of data then a single modulo operation isn't going to cost a lot. But if you're just hashing an English word, possibly even a URL, I'm not so sure.

You could shift or mask the result (as described above) to use a table size of any power of 2 to avoid the modulo.

mseebach · on May 29, 2012

Doing constant modulo on an integer is very fast - we're talking a few CPU operations.

haberman · on May 29, 2012

It won't be a compile-time constant unless your hash table is incapable of resizing. And modulo arithmetic on arbitrary values is slow (20-94 cycles of latency on Sandy Bridge).

mseebach · on May 30, 2012

Hmm, good point. Is that also true for JIT compilers?

dchest · on May 29, 2012

Yes, but bitwise AND is faster.

sukuriant · on May 29, 2012

Cross that bridge when you get there. If you're really finding that your hash function is the slowest piece of code in a tightly bound for-loop, and it's not all the collisions you're having (from having a bad hashing function), then look into alternatives. Before that, you're doing premature optimization.

dchest · on May 29, 2012

Please read the whole thread before replying with the "premature optimization" thing. We are talking about hash table optimizations; this commenter -- http://news.ycombinator.com/item?id=4037320 -- is asking about the cost of modulo operation.

terangdom · on May 30, 2012

IIrc changing a modulo to an and roughly doubled the performance of my hashtable.