Few people are talking about it but... what do you think about the very over-the-top enthusiasm?
To me, it sounds like TikTok TTS, it's a bit uncomfortable to listen to. I've been working with TTS models and they can produce much more natural sounding language, so it is clearly a stylistic choice.
I like for that degree of expressiveness to be available as an option, although it would be really irritating if I was trying to use it to learn some sort of academic coursework or something.
But if it's one in a range of possible stylistic flourishes and personalities, I think it's a plus.
Looks like their TTS component is separate from the model. I just tried 4o, and there is a list of voices to select from. If they really only allowed that one voice or burned it into the model, then that would probably have made the model faster, but I think it would have been a blunder.
To me, it sounds like TikTok TTS, it's a bit uncomfortable to listen to. I've been working with TTS models and they can produce much more natural sounding language, so it is clearly a stylistic choice.
So what do you think?