Is there a similar system available for Linux?

lunixbochs · on Dec 31, 2018

The best I know of for Linux is Aenea, which requires you to run Dragon in a virtual machine or on another computer. I will have Talon ported to Linux at some point, and that will include coming up with speech engine options for Linux. I think one of the better options is to make Windows Dragon work well in WINE. (Despite the numbers being impressive on DeepSpeech and wav2letter++, they’re not yet optimized for continuous recognition, which means you need to feed them chunks of finished recordings, and their letter based recognition requires weird fuzzy matching if you want command grammars that I have yet to solve. I’ve actually considered using the fastest continuous engine I can find, then feeding the sound into a better open source non-continuous engine after each recognition)

shakna · on Dec 31, 2018

The state of voice on Linux is awful.

I've got something close to the Talon control, without eye tracking, I'm working on now and then, using the Google API for voice interpret, because CMUSphinx was awful (50% accuracy per word, whereas Google was closer to 90%).

I'm hoping that Mozilla Voice when it comes will finally solve this or make it easy to build a decent control system.

yorwba · on Dec 31, 2018

Mozilla DeepSpeech has had a release [1] that comes with a pre-trained model achieving 11% WER on clean audio in the LibriSpeech test corpus. That's close to the WER you're getting with Google, but I guess your audio quality isn't as good, so DeepSpeech would perform worse.

Mozilla Common Voice [2] is the project to collect more training data so that DeepSpeech (and other projects) can achieve the accuracy that is known to be possible with the same architecture trained on larger private datasets.

Then there's Facebook's newly released wav2letter++ [3], which claims to achieve better accuracy with the same training data. However, some people have been unable to exactly reproduce those results, getting "only" 5.15% WER [4]. Still better than what Mozilla DeepSpeech can deliver, though.

[1] https://github.com/mozilla/DeepSpeech/releases/latest

[2] https://voice.mozilla.org

[3] https://github.com/facebookresearch/wav2letter

[4] https://github.com/facebookresearch/wav2letter/issues/88

eggie · on Dec 31, 2018

Does this mean that Talon is based on private APIs provided in OSX?

lunixbochs · on Dec 31, 2018

It’s not locked to macOS at all. I plan to support win/lin/mac equally, Talon has its own grammar compiler and engine-independent word parser. The main hold up for porting is all of the interaction APIs like key simulation, drawing overlays on the screen, that sort of thing.