Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I do the same with a small math problem and so far only Qwen3 got it right (tested all thinking models). So your mileage may vary, as they say!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: