>The most impressive part is that the voice uses the right feelings and tonal language during the presentation.
Consequences of audio2audio (rather than audio >text text>audio). Being able to manipulate speech nearly as well as it manipulates text is something else. This will be a revelation for language learning amongst other things. And you can interrupt it freely now!
Elevenlabs isn’t remotely close to how good this voice sounds. I’ve tried to use it extensively before and it just isn’t natural. This voice from openAI and even the one chatGPT has been using is natural.
When have you last used it. I used a few weeks ago to create a fake podcast as a side project recently and it sounded pretty good with their highest end model with cranked up tunings.
My point isn’t necessarily elevenlabs being good or bad, it’s the difference between its text to voice and voice to voice generations. The latter is incredibly expressive and just shows how much is lacking in our ability to encode inflection in text.
I did see that, though my interpretation is that breathing is included in its voice tokenizer which helps it understand emotions in speech (the AI can generate breath sounds after all). Other sounds, like bird songs or engine noises, may not work - but I could be wrong.
I suspect that like images and video, their audio system is or will become more general purpose. For example it can generate the sound of coins falling onto a table.
allegedly google assistant can do the "humming" one but i have never gotten it to work. I wish it would because sometimes i have a song stuck in my head that i know is sampled from another song.
I asked it to make a bird noise, instead it told me what a bird sounds like with words. True audio to audio should be able to be any noise, a trombone, traffic, a crashing sea, anything. Maybe there is a better prompt there but it did not seem like it.
Consequences of audio2audio (rather than audio >text text>audio). Being able to manipulate speech nearly as well as it manipulates text is something else. This will be a revelation for language learning amongst other things. And you can interrupt it freely now!