Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would expect the stable based dispatch to be handled quite well with branch prediction. And surely the cache misses from those nested loops would have a much worse impact. Even if a few extra instructions have to run per pixel it's going to be quicker than a fetch from main memory.

All the examples, in each language, could be rewritten as a cache oblivious algorithm to optimise cache usage. This would speed them all up. See https://en.wikipedia.org/wiki/Cache-oblivious_algorithm



In general, I tend to agree there's a lot of people who have kind of picked up "vtables always bad and slow" and overestimate the actual overhead.

But I have actually benchmarked this before, and it is possible to have a function body so small (like, for example, a single slice array index lookup and return of a register-sized value like a machine word) that the function call overhead can dominate even so.

(Languages like Erlang and Go that emphasize concurrency have a constant low-level stream of posts on their forums from people who do an even more extreme version, when they try to "parallelize" the task of adding a list of integers together, and replace a highly-pipelineable int add operation that can actually come out to less than one cycle per add with spawning a new execution context, sending over the integers to add, adding them in the new context, and then synchronizing on sending them back. Then they wonder why Erlang/Go/threading in general sucks so much because this new program is literally hundreds of times slower than the original.)

But it is true this amortizes away fairly quickly, because the overhead isn't that large. Even the larger random number generators like the Mersenne Twister will be a long ways towards dominating the function call overhead. I don't even begin to worry about function call overhead unless I can see I'm trying to do several million per second, because generally, you can't do several million per second on a single core because the function bodies themselves are too large and doing too much stuff, such that even if function call overhead was 0 it would still be impossible in the particular program to do it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: