No: soon the wide wild world itself becomes training data. And for much more than just an LLM. LLM plus reinforcement learning—this is were the capacity of our in silico children will engender much parental anxiety.
However, I think the most cost-effective way to train for real world is to train in a simulated physical world first. I would assume that Boston Dynamics does exactly that, and I would expect integrated vision-action-language models to first be trained that way too.
That's how everyone in robotics is doing these days.
You take a bunch of mo-cap data and simulate it with your robot body. Then as much testing as you can with the robot and feed the behavior back in to the model for fine tuning.
Unitree gives an example of the simulation versus what the robot can do in their latest video