That's why the IMO results were so notable: that was one of those moments where new models were demonstrated doing something that they had previously been unable to do.
I can't fathom why more people aren't talking about the IMO story. Apparently the model they used is not just an LLM but some RL are involved too. If a model wins gold at IMO, is it still merely a "statistical parrot"?
The same thing has also been achieved by a Google DeepMind team and at least one group of independent researchers using publicly available models and careful promoting tricks.