Claude 3 Opus is reporting superior metrics, particularly in its coding ability,...

int_19h · on March 18, 2024

When it comes to LLMs, metrics are misleading and easy to game. Actually talking to it and running it through novel tasks that require ability to reason very quickly demonstrates that it is not on par with GPT-4. As in, it can't solve things step-by-step that GPT-4 can one-shot.

FloorEgg · on March 18, 2024

This was exactly my experience. I have very complex prompts and I test them on new models and nothing performs as well as GPT-4 that I've tried (Claude 3 Opus included)

astrange · on March 18, 2024

It's a bit better at writing jokes. GPT is stiff and unfunny - which is why the twitter spambots using it to generate text are so obvious.