The state of voice on Linux is awful. I've got something close to the Talon cont...

yorwba · on Dec 31, 2018

Mozilla DeepSpeech has had a release [1] that comes with a pre-trained model achieving 11% WER on clean audio in the LibriSpeech test corpus. That's close to the WER you're getting with Google, but I guess your audio quality isn't as good, so DeepSpeech would perform worse.

Mozilla Common Voice [2] is the project to collect more training data so that DeepSpeech (and other projects) can achieve the accuracy that is known to be possible with the same architecture trained on larger private datasets.

Then there's Facebook's newly released wav2letter++ [3], which claims to achieve better accuracy with the same training data. However, some people have been unable to exactly reproduce those results, getting "only" 5.15% WER [4]. Still better than what Mozilla DeepSpeech can deliver, though.

[1] https://github.com/mozilla/DeepSpeech/releases/latest

[2] https://voice.mozilla.org

[3] https://github.com/facebookresearch/wav2letter

[4] https://github.com/facebookresearch/wav2letter/issues/88

eggie · on Dec 31, 2018

Does this mean that Talon is based on private APIs provided in OSX?

lunixbochs · on Dec 31, 2018

It’s not locked to macOS at all. I plan to support win/lin/mac equally, Talon has its own grammar compiler and engine-independent word parser. The main hold up for porting is all of the interaction APIs like key simulation, drawing overlays on the screen, that sort of thing.