Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The state of voice on Linux is awful.

I've got something close to the Talon control, without eye tracking, I'm working on now and then, using the Google API for voice interpret, because CMUSphinx was awful (50% accuracy per word, whereas Google was closer to 90%).

I'm hoping that Mozilla Voice when it comes will finally solve this or make it easy to build a decent control system.



Mozilla DeepSpeech has had a release [1] that comes with a pre-trained model achieving 11% WER on clean audio in the LibriSpeech test corpus. That's close to the WER you're getting with Google, but I guess your audio quality isn't as good, so DeepSpeech would perform worse.

Mozilla Common Voice [2] is the project to collect more training data so that DeepSpeech (and other projects) can achieve the accuracy that is known to be possible with the same architecture trained on larger private datasets.

Then there's Facebook's newly released wav2letter++ [3], which claims to achieve better accuracy with the same training data. However, some people have been unable to exactly reproduce those results, getting "only" 5.15% WER [4]. Still better than what Mozilla DeepSpeech can deliver, though.

[1] https://github.com/mozilla/DeepSpeech/releases/latest

[2] https://voice.mozilla.org

[3] https://github.com/facebookresearch/wav2letter

[4] https://github.com/facebookresearch/wav2letter/issues/88


Does this mean that Talon is based on private APIs provided in OSX?


It’s not locked to macOS at all. I plan to support win/lin/mac equally, Talon has its own grammar compiler and engine-independent word parser. The main hold up for porting is all of the interaction APIs like key simulation, drawing overlays on the screen, that sort of thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: