I know you're going to reply with "BUT MY PREPROCESSOR", but template specializa...

ryao · 2025-04-29T01:23:38 1745889818

I have used the preprocessor to avoid this sort of slowdown in the past in a binary search function:

https://github.com/openzfs/zfs/commit/677c6f8457943fe5b56d7a...

The performance gain comes not from eliminating the function overhead, but enabling conditional move instructions to be used in the comparator, which eliminates a pipeline hazard on each loop iteration. There is some gain from eliminating the function overhead, but it is tiny in comparison to eliminating the pipeline hazard.

That said, C++ has its weaknesses too, particularly in its typical data structures, its excessive use of dynamic memory allocation and its exception handling. I gave an example here:

https://news.ycombinator.com/item?id=43827857

Honestly, I think these weaknesses are more severe than qsort being unable to inline the comparator.

uecker · 2025-04-29T01:40:06 1745890806

A comparator can be inlined just fine in C. See here where the full example is folded to a constant: https://godbolt.org/z/bnsvGjrje

Does not work if the compiler can not look into the function, but the same is true in C++.

ryao · 2025-04-29T01:47:29 1745891249

That does not show the comparator being inlined since everything was folded into a constant, although I suppose it was. Neat.

Edit: It sort of works for the bsearch() standard library function:

https://godbolt.org/z/3vEYrscof

However, it optimized the binary search into a linear search. I wanted to see it implement a binary search, so I tried with a bigger array:

https://godbolt.org/z/rjbev3xGM

Now it calls bsearch instead of inlining the comparator.

jcalvinowens · 2025-04-29T02:08:07 1745892487

With optimization, it will really inline it with an unknown size array: https://godbolt.org/z/sK3nK34Y4

That's not the most general case, but it's better than I expected.

ryao · 2025-04-29T02:18:09 1745893089

Nice catch. I had goofed by omitting optimization when checking this from an iPad.

That said, this brings me to my original reason for checking this, which is to say that it did not use a cmov instruction to eliminate unnecessary branching from the loop, so it is probably slower than a binary search that does:

https://en.algorithmica.org/hpc/data-structures/binary-searc...

That had been the entire motivation behind this commit to OpenZFS:

https://github.com/openzfs/zfs/commit/677c6f8457943fe5b56d7a...

It should be possible to adapt this to benchmark both the inlined bsearch() against an implementation designed to encourage the compiler to emit a conditional move to skip a branch to see which is faster:

https://github.com/scandum/binary_search

My guess is the cmov version will win. I assume merits a bug report, although I suspect improving this is a low priority much like my last report in this area:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110001