>Test-time compute/RL on LLMs:
>It will not meaningfully generalize beyond domains with easy verification.
To me, this is the biggest question mark. If you could get good generalized "thinking" from just training on math/code problems with verifiers, that would be a huge deal. So far, generalization seems to be limited. Is this because of a fundamental limitation, or because the post-training sets are currently too small (or otherwise deficient in some way) to induce good thinking patterns? If the latter, is that fixable?
> Is this because of a fundamental limitation, or because the post-training sets are currently too small (or otherwise deficient in some way) to induce good thinking patterns?
"Thinking" isn't a singular thing. Humans learn to think in layer upon layer of understandig the world, physical, social and abstract, all at many different levels.
Embodiment will allow them to use RL on the physical world, and this in combination with access to not only means of communication but also interacting in ways where there is skin in the game, will help them navigate social and digital spaces.
To me, this is the biggest question mark. If you could get good generalized "thinking" from just training on math/code problems with verifiers, that would be a huge deal. So far, generalization seems to be limited. Is this because of a fundamental limitation, or because the post-training sets are currently too small (or otherwise deficient in some way) to induce good thinking patterns? If the latter, is that fixable?