It’s suspicious that despite being trained on audio tokens in addition to text and image tokens it performs almost exactly the same as GPT-4.
GPT-4o could be a half-baked GPT-5 in that they stopped training early when it had comparable performance to GPT-4. There is still more loss to lose.
Or maybe there’s a performance ceiling that all models are converging to, but I think this is less likely.
It’s suspicious that despite being trained on audio tokens in addition to text and image tokens it performs almost exactly the same as GPT-4.
GPT-4o could be a half-baked GPT-5 in that they stopped training early when it had comparable performance to GPT-4. There is still more loss to lose.
Or maybe there’s a performance ceiling that all models are converging to, but I think this is less likely.