Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The barplot is wrong, the numbers are correct. Looks like they had a dummy plot and never updated it, only the numbers to prevent leaking?

Screenshot of the blog plot: https://imgur.com/a/HAxIIdC



Haha, even with that, it says 4o does worse with 2 passes than with 1.

Edit: Nevermind, just now the first one is SWE-bench and 2nd is aider.


Those are different benchmarks


I see now on the website, the screenshot cut off the header for the first benchmark, looked like it was just comparing 1-pass and 2-pass.


Yes, sorry didn't fit everything on the screenshot.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: