Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The thing that strikes me as particularly weird about this is the use of numpy there. In other languages, they're using native code. In python they're reaching out to numpy, which is a great library, but not awesome inside a hot loop unless you're keeping the operation you're carrying out within numpy itself. This means, right in that hot loop, they're doing a lot of translating of numbers between python representation and native c representation.

Cutting numpy out by making the line "t = [0.0]*l" gets it down to 17059 ms, without attempting any other optimisations. Using that plus pypy (so you get the JIT that you have with javascript/v8) gets it down to 958 ms.

To show the cost of those translations, sticking with numpy. If we switch that inner loop to:

        temp_t_i = t[i]
        temp_t_i += 0.02 * j
        temp_t_i *= 0.03 * j
        temp_t_i -= 0.04 * j
        temp_t_i /= 0.05 * (j+1)
        temp_t_i = temp_t_i
It speeds us up from 47552 ms to 28251 ms. Almost half the execution time. That's still doing two hops back and forth, though. If you cut it down to a single line it's even faster at 18458 ms, cutting execution time down to about a third of the original example. Pypy isn't able to help here at all, this is sort of a pathological case for it.

edit: I'll add, I'm not that good with numpy, rarely use it myself. Not sure if it's possible to do that inner loop all within numpy somehow. I imagine that'd be a lot faster still.



Yeah, an all-numpy version runs in less than 1ms on my M1 air

  import numpy as np
  l = 10_000
  t = np.empty(l, dtype=np.float32)
  j = np.arange(l)
  t = 0.02 \* j
  t *= (0.03 * j)
  t -= (0.04 \* j)
  t /= 0.05 \* (j + 1)


So rather than spend 10-20 minutes reading about numpy, the author wrote 3 other implementations...?

The fact that they ran the C code without the optimisation flags and compared it that way makes me think Javascript was what they actually wanted to write this one in anyway.


The author, by the looks of it from the article, is a uni student. Rather than straw-manning this into a language war, we should laude the fact that they managed to write the same thing in a number of different languages to begin with.


This is the problem that Numba was designed to solve.

https://numba.readthedocs.io/en/stable/user/5minguide.html


No. The problem in this post can be vectorized (== expressed as array OPs) with idiomatic numpy, making it very fast (see sibling comment).

Numba is designed to speed up code where looping is unavoidable, i.e. the code can't be (easily) expressed as array OPs.


Negative again, a series of array operations which are individually idiomatic numpy like this will run very very fast in numba as it can coalesce the ops into a single pass through memory. Numpy can't do this and has to pass through the array for each individually array operation. There's nothing wrong with straight numpy but if you want it compiled-C fast for the whole ensemble of array ops, you need a JIT.


Ah, I was under the impression that using array OPs inside a numba njit gave worse performance. Has this changed, or is my memory tricking me?

Do you have a source for this? I have not seen it in the numba docs.


Numba replaces ops with LLVM byte code equivalent so the compiler will optimise out and coalesce the operations as part of the optimisation stage.

If you want to look at the sort of things compilers do though, take a look at “common subexpression elimination”




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: