So if not exponential, what would you call adding voice and image recognition, f...

skepticATX · on May 13, 2024

All of that is awesome, and makes for a better product. But it’s also primarily an engineering effort. What matters here is an increase in intelligence. And we’re not seeing that aside from very minor capability increases.

We’ll see if they have another flagship model ready to launch. I seriously doubt it. I suspect that this was supposed to be called GPT-5, or at the very least GPT-4.5, but they can’t meet expectations so they can’t use those names.

coffeebeqn · on May 14, 2024

Isn’t one of the reasons for the Omni model that text based learning has a limit of source material. If it’s just as good at audio that opens a whole another set of data - and a interesting UX for users

throwthrowuknow · on May 14, 2024

I believe you’re right. You can easily transcribe audio but the quality of the text data is subpar to say the least. People are very messy when they speak and rely on the interlocutor to fill in the gaps. Training a model to understand all of the nuances of spoken dialogue opens that source of data up. What they demoed today is a model that to some degree understands tone, emotion and surprisingly a bit of humour. It’s hard to get much of that in text so it makes sense that audio is the key to it. Visual understanding of video is also promising especially for cause and effect and subsequently reasoning.