You're wrong. The people building the models don't write CUDA kernels. The peopl...

HarHarVeryFunny · on July 16, 2024

How much performance difference is there between writing a kernel in a high level language/framework like PyTorch (torch.compile) or Triton, and hand optimizing? Are you writing kernels in PTX?

What's your opinion on the future of writing optimized GPU code/kernels - how long before compilers are as good or better than (most) humans writing hand-optimized PTX?

throwaway81523 · on July 16, 2024

The CUDA version of LCZero was around 2x or 3x faster than the Tensorflow(?) version iirc.

throwaway81523 · on July 16, 2024

Heh I'm in the wrong business then. Interesting. Used to be that game programmers spent lots of time optimizing non-ML CUDA code. They didn't make anything like 500k at that time. I wonder what the ML industry has done to game development, or for that matter to scientific programming. Wow.