Is it stagnation in reasoning ability, or is OpenAI pulling their punches? It’s ...

Is it stagnation in reasoning ability, or is OpenAI pulling their punches?

It’s suspicious that despite being trained on audio tokens in addition to text and image tokens it performs almost exactly the same as GPT-4.

GPT-4o could be a half-baked GPT-5 in that they stopped training early when it had comparable performance to GPT-4. There is still more loss to lose.

Or maybe there’s a performance ceiling that all models are converging to, but I think this is less likely.