This is absolutely the case, TPUs scale very well: https://github.com/google/max...

pama · on June 13, 2024

The repo mentiones a Karpathy tweet from Jan 2023. Andrej has recently created llm.c and the same model trained about 32x faster on the same NVidia hardware mentioned in the tweet. I dont think the perfomance estimate that the repo used (based on that early tweet) was accurate for the performance of the NVidia hardware itself.