Despite lots of Internet talk about text to speech, there's still no really amaz...

kouteiheika · on Jan 6, 2023

There is, just not for English.

Here, take a look at this snippet:

They play 4 clips, 2 of them human, 2 of them AI generated. Can you tell which ones are which?

And the kicker is, this works in real-time (AFAIR), and it doesn't even use the GPU (it's CPU-only), and generates pitch-correct speech (for Japanese). It's not even funny how far ahead they are. And you can buy it right now.

AFAIK they use some sort of a hybrid method with a bunch of custom modeling DSP code around it (they've been doing speech synthesis for over a decade) plus a neural network. One mistake that essentially all of the western TTS models seem to make is that they use only a neural network, without augmenting it with non neural network code, which (from what I can see) is the secret sauce to make a fast and good sounding TTS work.

burkaman · on Jan 6, 2023

I can't, but I don't speak Japanese. Can a fluent Japanese speaker really not tell?

orbital-decay · on Jan 6, 2023

Yes, but can it be used to express emotions? Can it derive emotions from the text alone, without the painstaking guidance? That seems to be the main culprit with existing TTS engines; neutral tone can be generated relatively well.

vanderZwan · on Jan 6, 2023

Honestly I suspect some kind of "emotional markdown" would be more useful if it has a light and intuitive syntax

com2kid · on Jan 6, 2023

Humans can't drive emotions from text alone without external contextual clues.

Such has been the source of much miscommunication online.

Amateur fiction writing, which tends to overemphasize how things are said ("I guess I can go rescue your cat", the exasperated detective said wearily) might be easier for AI!

Valgrim · on Jan 6, 2023

Sure they can, that's the whole point of acting. Also anyone who ever read a story to a kid can infer emotions from the text itself.

com2kid · on Jan 8, 2023

To a limited extent, sure, but kids books are also written to be very emotive.

The linked page actually has examples of the same text being read with different emotions, demonstrating that for even a single sentence a lot of variance is possible.

Simon321 · on Jan 6, 2023

This looks pretty good: https://play.ht/

It was used to do the fake joe rogan/steve jobs podcast: https://podcast.ai/

orbital-decay · on Jan 6, 2023

Natural-sounding TTS models that require additional work (not entirely automatic) exist for quite a while. Obsidian used Sonantic for Outer Worlds (an AA game) in 2019, and the dialogues sounded like they were voiced by real actors.

dirtyid · on Jan 6, 2023

I think many heavy TTS users (including myself) slowly train to use higher speeds after which point nothing sounds particularly natural. What I want is trained speech models that remain coherent at high speeds (over 3x). Even better if there's bi/multi lingual models that can seemelessly switch between languages.

Xeanort · on Jan 9, 2023

At the moment the best is probably Google WaveNet/Neural2, you can try it here: https://cloud.google.com/text-to-speech

You can use the API to read books/articles aloud in real-time, but it is quite expensive after the free trial.

romeros · on Jan 6, 2023

You should try murf dot ai. It is pretty realistic. Completely blows Amazon Polly and Google's TTS out of water.

andrewstuart · on Jan 6, 2023

I tried it and it sounded like TTS.

If I tried the wrong thing can you provide a link? I’d like to be amazed.

read-bird · on Jan 6, 2023

Check out Descript's Overdub - it is pretty amazing: https://www.descript.com/overdub

trekkie1024 · on Jan 6, 2023

It's free, but I find the "Read aloud" feature in Microsoft Edge to be extremely natural sounding. Try using it to read this comment!

lionside · on Jan 6, 2023

Checkout https://resemble.ai

I used to work there; great team behind the product!

kretaceous · on Jan 6, 2023

Same feeling here. I would love to listen to some of my bookmarked articles in a better, well punctuated/stressed voice.

Tepix · on Jan 6, 2023

Have you looked at the demos for tortoise tts? It's even free. It's not real-time however.

visarga · on Jan 6, 2023

NaturalReaders - the premium voices