Note that the Java impl is creating objects all over the place in the inner loop - madness!
I'm sure a "mechanical translation of [the] C" version would improve things for the Java ver as well. If we removed startup costs (the class file validation, etc) I'd expect it to be on par with C.
The startup costs are fixed (they don't increase linearly with the number of loop iterations) and for such a small program, they could not reasonbly explain more than 1s of the 20s gap between C and Java. Also, I don't think it's "allocating objects in an inner loop", because Java's allocations are super cheap (bump allocations) if the escape analyzer doesn't keep them on the stack in the first place.
That said, after examining this benchmark further, I don't think it's very good since the sequences returned by the random number generators are not controlled for (each implementation uses its own standard library RNG with their own seeds, so the sequences will vary from language to language). This likely causes more loop iterations, but considering the loop termination condition, the theoretical distribution of RNG outputs, and the trivial work done in the loop body, I doubt that the delta in loop iterations can explain any significant portion of the gap. Rather, I think the gap is simply a difference in performance of the RNGs themselves--C and Rust use a poor man's RNG (xorshift) which performs very well for this exercise but is not a good general purpose RNG (and standard library RNGs are optimized for the general case). When I rewrote the Go version, using the xorshift implementation made the most significant impact (15s), although I'm not 100% sure that the output of the RNG isn't just causing it to run the RNG less frequently. I opened up this ticket against the project: https://github.com/niofis/raybench/issues/15.
I'm sure a "mechanical translation of [the] C" version would improve things for the Java ver as well. If we removed startup costs (the class file validation, etc) I'd expect it to be on par with C.