The thing that strikes me as particularly weird about this is the use of numpy there. In other languages, they're using native code. In python they're reaching out to numpy, which is a great library, but not awesome inside a hot loop unless you're keeping the operation you're carrying out within numpy itself. This means, right in that hot loop, they're doing a lot of translating of numbers between python representation and native c representation.
Cutting numpy out by making the line "t = [0.0]*l" gets it down to 17059 ms, without attempting any other optimisations. Using that plus pypy (so you get the JIT that you have with javascript/v8) gets it down to 958 ms.
To show the cost of those translations, sticking with numpy. If we switch that inner loop to:
It speeds us up from 47552 ms to 28251 ms. Almost half the execution time. That's still doing two hops back and forth, though. If you cut it down to a single line it's even faster at 18458 ms, cutting execution time down to about a third of the original example. Pypy isn't able to help here at all, this is sort of a pathological case for it.
edit: I'll add, I'm not that good with numpy, rarely use it myself. Not sure if it's possible to do that inner loop all within numpy somehow. I imagine that'd be a lot faster still.
So rather than spend 10-20 minutes reading about numpy, the author wrote 3 other implementations...?
The fact that they ran the C code without the optimisation flags and compared it that way makes me think Javascript was what they actually wanted to write this one in anyway.
The author, by the looks of it from the article, is a uni student. Rather than straw-manning this into a language war, we should laude the fact that they managed to write the same thing in a number of different languages to begin with.
Negative again, a series of array operations which are individually idiomatic numpy like this will run very very fast in numba as it can coalesce the ops into a single pass through memory. Numpy can't do this and has to pass through the array for each individually array operation. There's nothing wrong with straight numpy but if you want it compiled-C fast for the whole ensemble of array ops, you need a JIT.
Cutting numpy out by making the line "t = [0.0]*l" gets it down to 17059 ms, without attempting any other optimisations. Using that plus pypy (so you get the JIT that you have with javascript/v8) gets it down to 958 ms.
To show the cost of those translations, sticking with numpy. If we switch that inner loop to:
It speeds us up from 47552 ms to 28251 ms. Almost half the execution time. That's still doing two hops back and forth, though. If you cut it down to a single line it's even faster at 18458 ms, cutting execution time down to about a third of the original example. Pypy isn't able to help here at all, this is sort of a pathological case for it.edit: I'll add, I'm not that good with numpy, rarely use it myself. Not sure if it's possible to do that inner loop all within numpy somehow. I imagine that'd be a lot faster still.