It got me thinking as well, so I ventured and did some experiments on this, and found that the main difference is the algorithm used for the RNG; C's std lib uses a slower one (which also is thread safe, and butchered OpenMP performance). You can take a look at a more apples to apples comparison in the latest update for crb.c which uses a xor128 rng; rust is still a little faster (especially when going multithreaded), but not quite the difference in the README file, still need to get some time to update it.
fwiw, I looked at some of the quicker c/rust examples, without too much other analysis
crb-vec-omp //I added some #pragma omp to crb-vec
executable size:
18k
time:
real 0m3.630s
valgrind:
==17703== HEAP SUMMARY:
==17703== in use at exit: 7,408 bytes in 15 blocks
==17703== total heap usage: 20 allocs, 5 frees,
14,790,856 bytes allocated
rsrb_alt_mt.rs
executable size:
426k
time:
real 0m1.630s
valgrind:
==7221== HEAP SUMMARY:
==7221== in use at exit: 43,120 bytes in 216 blocks
==7221== total heap usage: 256 allocs, 40 frees,
11,113,784 bytes allocated
and because we have a number of tiny single cpu vm's out there (which would also benefit from a performant language) I gave it a shot there:
:~# time ./rsrb_alt_mt
./rsrb_alt_mt: /lib64/libc.so.6: version `GLIBC_2.18' not
found (required by ./rsrb_alt_mt)
real 0m0.002s
user 0m0.002s
sys 0m0.000s
:~# time ./crb-vec-omp
real 0m24.234s
user 0m24.160s
sys 0m0.035s
so rust appears broken on centos 7.5 (no, I'm not going to edit the binary). But that is an insta-deal breaker for us.
fwiw, I took a stab at replacing a bunch of => with . and ran it:
time ./crb-vec-omp
real 0m0.764s
which is more than twice as fast as the rust example,
but it didn't create the right output...
if you get bored would you mind taking a stab at adding parallel and modifying crb-vec.c https://github.com/niofis/raybench , I definitely think you might be on to something here.