Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Claude 3 Opus is reporting superior metrics, particularly in its coding ability, and in the LLM Arena it is statistically tied with GPT-4.


When it comes to LLMs, metrics are misleading and easy to game. Actually talking to it and running it through novel tasks that require ability to reason very quickly demonstrates that it is not on par with GPT-4. As in, it can't solve things step-by-step that GPT-4 can one-shot.


This was exactly my experience. I have very complex prompts and I test them on new models and nothing performs as well as GPT-4 that I've tried (Claude 3 Opus included)


It's a bit better at writing jokes. GPT is stiff and unfunny - which is why the twitter spambots using it to generate text are so obvious.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: