Open-source speech recognition is doing pretty good with projects such as VOSK, Athena, ESPNet and SpeechBrain.
These days models are the easy part of ML, and data is the hard one. So for Mozilla to focus on Common Voice over DeepSpeech seems reasonable.
You can't really do it because of licensing reasons. One cool thing Common Voice brings to the table, besides all the fantastic data, is the licensing.
Which, it must be said, isn't always as bullet-proof as it could be. There's a not insignificant amount of transcription (or pronunciation) errors in those datasets and Mozilla might want to find ways to increase the quality of already-released data over time.
Fair use isn't a feature of copyright in every juristiction, which could make this a less than useful idea trying to create a global corpus of speech data.
This is incorrect. Pretty much every state of the art model uses copyrighted data. This is considered fair use and it has never been a problem outside of concern trolling.
As a lot of that cc text is automatically generated it seems like you’d just be creating a clone of other software, which might be an intellectual property issue.
Having an open corpus means that researchers building the next thing in voice research - which may or may not follow DeepSpeech - have something to work with. This is enormously important and their change of direction lets a thousand flowers bloom. Meanwhile, their partnership with Nvidia provides a fertile ground to prove the value of the open corpus in action. Nvidia get access to Mozilla's (presumably superior) ability to build said corpus, while Mozilla lay the foundations for others to contribute work in the open. It is a great example of comparative advantage, and a win win choice, IMO.
So in other words we provide data for free to Mozilla, and Mozilla turns around and sells it for millions to Nvidia to fund ... not open source, they killed that so umm ee, to fund ceo salary?
You seem to imply that Nvidia are paying for data that is freely available.
Anyone can use the Common Voice data within the terms of the license and NVIDIA contributing towards the continued gathering of data (that will continue to be made publicly available) won't change that.
It's a huge shame that Mozilla didn't continue the DeepSpeech project but Coqui is taking on the mantle there and there are plenty of others working on open source solutions too, all whilst the existence of CV will make a big difference to research, in the academic, commercial and open source spheres.
If that was true that would be a profoundly bad purchase for NVidia since the data is already freely licensed and available for anyone to use at no cost.
This is like saying that Epic "bought" Blender when they gave it a development grant, or that Google contributing patches to upstream Linux means they own it now. Mozilla didn't give NVidia any kind of special license, when NVidia contributes data to Common Voice they're doing so under Common Voice's license, not their own.
We want to encourage more companies to treat software and training data as a public commons that is collectively maintained, this is a good thing.
This is silly. Common Voice is not adding NVidia-specific features; what would that even look like for a database? There is no comparison to be made between donating resources to an openly licensed database and encouraging developers to optimize their games for proprietary APIs.
And the assumption the shutting down Deep Speech was specifically for NVidia's benefit seems like a fairly large leap to me, given that Deep Speech is already mature, still being developed under Coqui.ai, and surrounded by a wide diversity of other deep learning projects that also aren't controlled by NVidia.
Decreasing barriers of entry for those models and providing raw data is probably the right thing for Mozilla to be focusing on right now. Any team can build a language model, only companies like Mozilla can coordinate mass data collection for those models.
About their TTS system: "These models provide speech synthesis with ~0.12 real-time factor on a GPU and ~1.02 on a CPU." The quality of the samples is really impressive but, wow, but isn't this computationally too expensive for many applications?
>If, for example, it takes 8 hours of computation time to process a recording of duration 2 hours, the real time factor is 4. When the real time factor is 1, the processing is done in real time. It is a hardware-dependent value.
I think real-time factors smaller than 1 are faster than real-time (not slower) and use less than 100% of a resource's computational power to keep up.
Not sure what you're quoting because I didn't write that, but
> I think real-time factors smaller than 1 are faster than real-time (not slower) and use less than 100% of a resource's computational power to keep up.
Sure, but who has the necessary GPUs installed? And on CPUs it will apparently take longer to generate speech than the duration of that speech. Unusable for many UIs and it will also drain the batteries of any portable device.
You're not wrong, but with so many chips incorporating some sort of dedicated "AI" or "tensor" functionality, perhaps the issue will resolve itself for most portable devices in a few years. Plus there's always the option of optimizing a little more and/or abusing other available hardware such as DSP chips to get the real time factor down. Anything over 1 isn't great, but it's not a bad start.
The source code is under a FLOSS license, but it only works on Nvidia GPUs and uses proprietary Nvidia-specific technologies like CUDA.
It's significantly closer to "nonfree" on the free-nonfree spectrum than it should be, and is another example of the difference between the guiding philosophies behind "free software" and "open source"
Can't you run it on CPU? And looking at the code, it seems like they're using Numba to JIT their CUDA kernels, so I guess someone could come along and provide a compatibility shim to make the kernels run on a non-CUDA accelerator?
Im sure they signed on adopting "something", otherwise it would be receiving $1.5 million grant for closing open source initiative. $3 million a year lawyer wouldn never be this blatant.
https://venturebeat.com/2021/04/12/mozilla-winds-down-deepsp...
https://blog.mozilla.org/en/mozilla/mozilla-partners-with-nv...
$1.5mil for shutting down open source initiative, almost half of CEO salary right there.