upperhalfplane's comments

upperhalfplane · 2026-03-20T09:43:55 1773999835

My favorite geometric proof of an inequality is the one I read on Terry Tao's blog. Interestingly, it's not presented as a geometric proof, but it is very much one: if you have two vectors x, y, you just shrink the longer one and grow the shorter one until they reach the same size, without changing the LHS and the RHS of the inequality. Then you expand the norms of ||x - y||^2>=0 and ||x + y||^2>=0 and see -||x||^2 - ||y||^2 <= 2<x,y> <= ||x||^2 + ||y||^2, and since ||x||=||y|| you get the result.

upperhalfplane · 2025-08-15T15:54:58 1755273298

TLDR: games are a good way to go.

upperhalfplane · 2025-08-15T15:45:35 1755272735

I don't quite think they cheat at math olympiads, but obviously there are blindspots for the unspectacular tasks. That being said, Mississippi is both a good and a bad question to ask. On the one hand, it's "the bare minimum" to require, on the other hand, is it really a feat? Like, most models can write a piece of code that would compute that. If you show me a task I'm not designed to solve (like count the number of i's in this text), the smart thing is actually to write a program to count them (which LLMs can do).

The best way to measure intelligence is probably to have a model know its strengths and weaknesses, and deal with them in an efficient way. And the most important thing for eval is that ability.

upperhalfplane · 2025-08-13T11:54:56 1755086096

> It's basically always guessing / confabulating / hallucinating if you ask it an introspective question like that.

You're absolutely right! This is the basis of this recent paper https://www.arxiv.org/abs/2506.06832

upperhalfplane · 2025-07-19T16:12:50 1752941570

100% agree about this too (also a professional mathematician). To mathematicians who have not been trained on such problems, these will typically look very hard, especially the more recent olympiad problems (as opposed to problems from eg 30 years ago). Basically these problems have become more about mastering a very impressive list of techniques than at the inception (and participants prepare more and more for these). On the other hand, research mathematics has become more and more technical, but the techniques are very different, so that the correlation between olympiads and research is probably smaller than it once was.

upperhalfplane · on Feb 20, 2025

I don't think it's an order of magnitude below... though it's a bit hard to know for sure

upperhalfplane · on Feb 20, 2025

It looks like you can (there are some LLM responses out there, e.g. Sonnet 3.5). Not clear if they can be super good at this, though.

tianlong · on Feb 20, 2025

Quite impressed by the entries of Claude