I'm not skilled enough in math to do a rigorous evaluation, so it was a quick ch... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		e1g on Jan 1, 2025 \| parent \| context \| favorite \| on: 30% drop in O1-preview accuracy when Putnam proble... I'm not skilled enough in math to do a rigorous evaluation, so it was a quick check. Terence Tao is skilled enough, and he describes O1's math ability is "...roughly on par with a mediocre, but not completely incompetent graduate student" (good discussion at https://news.ycombinator.com/item?id=41540902), and the next iteration O3 just got 25% on his brand new Frontier Math test. Seeing LLMs as useless is banal, but downplaying their rate of improvement is self-sabotage.

fumeux_fume on Jan 1, 2025 [–]

> "...roughly on par with a mediocre, but not completely incompetent graduate student"

Let it sink in how vague and almost meaningless that statement is.

pizza on Jan 1, 2025 | [–]

What types of questions are you hoping to answer for that to be considered a vague statement?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact