A competitive geoguesser clearly got there through memorizing copious internet searching. So comparing knowledge retained in the trained model to knowledge retained in the brain feels surprisingly fair.
Conversely, the model sharing, “I found the photo by crawling Instagram and used an email MCP to ask the user where they took it. It’s in Austria” is unimpressive
So independent from where it helps actually improve performance, the cheating/not cheating question makes for an interesting question of what we consider to be the cohesive essence of the model.
For example, RAG against a comprehensive local filesystem would also feel like cheating to me. Like a human geoguessing in a library filled with encyclopedias. But the fact that vanilla O3 is impressive suggests I somehow have an opaque (and totally poorly informed) opinion of the model boundary, where it’s a legitimate victory if the model was birthed with that knowledge baked in, but that’s it.
What's your take on man vs. machine? If AI already beats Master level players it seem certain that it will soon beat the Geoguessr world champion too. Will people still derive pleasure from playing it, like with chess?
I encourage everyone to try Geoguessr! I love it.
I'm seeing a lot of comments saying that the fact that the o3 model used web search in 2 of 5 rounds made this unfair, and the results invalid.
To determine if that's true, I re-ran the two rounds where o3 used search, and I've updated the post with the results.
Bottom line: It changed nothing. The guesses were nearly identical. You can verify the GPS coordinates in the post.
Here's an example of why it didn't matter. In the Austria round, check out how the model identifies the city based on the mountain in the background:
https://cdn.jsdelivr.net/gh/sampatt/media@main/posts/2025-04...
It already has so much information that it doesn't need the search.
Would search ever be useful? Of course it would. But in this particular case, it was irrelevant.