My favorite geometric proof of an inequality is the one I read on Terry Tao's blog. Interestingly, it's not presented as a geometric proof, but it is very much one: if you have two vectors x, y, you just shrink the longer one and grow the shorter one until they reach the same size, without changing the LHS and the RHS of the inequality. Then you expand the norms of ||x - y||^2>=0 and ||x + y||^2>=0 and see -||x||^2 - ||y||^2 <= 2<x,y> <= ||x||^2 + ||y||^2, and since ||x||=||y|| you get the result.
I don't quite think they cheat at math olympiads, but obviously there are blindspots for the unspectacular tasks. That being said, Mississippi is both a good and a bad question to ask. On the one hand, it's "the bare minimum" to require, on the other hand, is it really a feat? Like, most models can write a piece of code that would compute that. If you show me a task I'm not designed to solve (like count the number of i's in this text), the smart thing is actually to write a program to count them (which LLMs can do).
The best way to measure intelligence is probably to have a model know its strengths and weaknesses, and deal with them in an efficient way. And the most important thing for eval is that ability.
100% agree about this too (also a professional mathematician). To mathematicians who have not been trained on such problems, these will typically look very hard, especially the more recent olympiad problems (as opposed to problems from eg 30 years ago). Basically these problems have become more about mastering a very impressive list of techniques than at the inception (and participants prepare more and more for these). On the other hand, research mathematics has become more and more technical, but the techniques are very different, so that the correlation between olympiads and research is probably smaller than it once was.
reply