The other commenter is more articulate, but you simply cannot draw the conclusio...

robrenaud · 2025-08-22T11:07:14 1755860834

The Gemini IMO result used a specifically fine tuned model for math.

Certainly they weren't training on the unreleased problems. Defining out of distribution gets tricky.

simianwords · 2025-08-22T16:17:52 1755879472

>The Gemini IMO result used a specifically fine tuned model for math.

This is false.

https://x.com/YiTayML/status/1947350087941951596

This is false even for the OpenAI model

https://x.com/polynoamial/status/1946478250974200272

"Typically for these AI results, like in Go/Dota/Poker/Diplomacy, researchers spend years making an AI that masters one narrow domain and does little else. But this isn’t an IMO-specific model. It’s a reasoning LLM that incorporates new experimental general-purpose techniques."

Workaccount2 · 2025-08-22T13:58:28 1755871108

Every human taking that exam has fine tuned for math, specifically on IMO problems.