>The most impressive part is that the voice uses the right feelings and tonal la...

jcims · on May 13, 2024

Anyone who has used elevenlabs for voice generation has found this to be the case. Voice to voice seems like magic.

dyauspitr · on May 13, 2024

Elevenlabs isn’t remotely close to how good this voice sounds. I’ve tried to use it extensively before and it just isn’t natural. This voice from openAI and even the one chatGPT has been using is natural.

jasondigitized · on May 14, 2024

When have you last used it. I used a few weeks ago to create a fake podcast as a side project recently and it sounded pretty good with their highest end model with cranked up tunings.

dyauspitr · on May 14, 2024

About 3 months ago for that exact use case.

jcims · on May 14, 2024

My point isn’t necessarily elevenlabs being good or bad, it’s the difference between its text to voice and voice to voice generations. The latter is incredibly expressive and just shows how much is lacking in our ability to encode inflection in text.

pants2 · on May 13, 2024

However, this looks like it only works with speech - i.e. you can't ask it, "What's the tune I'm humming?" or "Why is my car making this noise?"

I could be wrong but I haven't seen any non-speech demos.

cube2222 · on May 13, 2024

Fwiw, the live demo[0] included different kinds of breathing, and getting feedback on it.

[0]: https://youtu.be/DQacCB9tDaw?t=557

throwaway11460 · on May 13, 2024

What about the breath analysis?

pants2 · on May 13, 2024

I did see that, though my interpretation is that breathing is included in its voice tokenizer which helps it understand emotions in speech (the AI can generate breath sounds after all). Other sounds, like bird songs or engine noises, may not work - but I could be wrong.

CooCooCaCha · on May 13, 2024

I suspect that like images and video, their audio system is or will become more general purpose. For example it can generate the sound of coins falling onto a table.

genewitch · on May 13, 2024

allegedly google assistant can do the "humming" one but i have never gotten it to work. I wish it would because sometimes i have a song stuck in my head that i know is sampled from another song.

twobitshifter · on May 13, 2024

I asked it to make a bird noise, instead it told me what a bird sounds like with words. True audio to audio should be able to be any noise, a trombone, traffic, a crashing sea, anything. Maybe there is a better prompt there but it did not seem like it.

famouswaffles · on May 13, 2024

The new voice mode has not rolled out yet. It's rolling out to plus users in the next couple weeks.

Also it's possible this is trained on mostly speech.