Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
famouswaffles
on Jan 23, 2025
|
parent
|
context
|
favorite
| on:
Results of "Humanity's Last Exam" benchmark publis...
There is actually. It's a bit buried. Section C.2 of the paper(page 24).
R1 is still the best. o1 drops a little (8.9)
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
R1 is still the best. o1 drops a little (8.9)