It's absolutely no longer about the data. We produce millions of new humans a year who wind up better at reasoning then these models but don't need to read the entire contents of the Internet to do it.
A relatively localized, limited lived experience apparently conveys a lot that LLM input does not - there's an architecture problem (or a compute constraint).
AI having societally useful impact is 100% about the data and overall training process (and robotics...), of which raw compute is a relatively trivial and fungible part.
No amount of reddit posts and H200s will result in a model that can cure cancer or drive high-throughput waste filtering or precision agriculture.
A relatively localized, limited lived experience apparently conveys a lot that LLM input does not - there's an architecture problem (or a compute constraint).