The mistake this article makes is assuming LLM scaling is one thing. It's not. R...

The mistake this article makes is assuming LLM scaling is one thing. It's not.

RL is spiky. It produces narrow improvements on specific capabilities. They're not making the model generically smarter, they're RLing in holes in the model's capabilities. In reality we don't have one scaling curve, we have thousands of them. We're in diminishing returns in "top line smarts" but we're raising the floor in a wide variety of areas that people who don't heavily eval models for a living might not notice.