Whats the point when they killed DeepSpeech in exchange for adapting closed Nvid...

jononor · on Aug 5, 2021

Open-source speech recognition is doing pretty good with projects such as VOSK, Athena, ESPNet and SpeechBrain. These days models are the easy part of ML, and data is the hard one. So for Mozilla to focus on Common Voice over DeepSpeech seems reasonable.

tkinom · on Aug 5, 2021

Would one use the youtube as training date?

Especially for the videos with Close Caption....

As simple as extracting the Audio and CC text?

soapdog · on Aug 5, 2021

You can't really do it because of licensing reasons. One cool thing Common Voice brings to the table, besides all the fantastic data, is the licensing.

anonymfus · on Aug 5, 2021

YouTube still allows uploaders to mark their videos as CC BY 3.0 licensed, and it's still possible to check that via YouTube's API.

(See https://support.google.com/youtube/answer/2797468 and the part about status.license here: https://developers.google.com/youtube/v3/docs/videos)

m-p-3 · on Aug 6, 2021

And the audio recordings are also curated by the volunteers, ensuring the audio snippets matches the text, etc.

jpetso · on Aug 6, 2021

Which, it must be said, isn't always as bullet-proof as it could be. There's a not insignificant amount of transcription (or pronunciation) errors in those datasets and Mozilla might want to find ways to increase the quality of already-released data over time.

ma2rten · on Aug 5, 2021

Are you sure it's not fair use? I believe most legal experts agree that language models such as GPT-3 are not violating copyright due to fair use.

M2Ys4U · on Aug 6, 2021

Fair use isn't a feature of copyright in every juristiction, which could make this a less than useful idea trying to create a global corpus of speech data.

humanistbot · on Aug 5, 2021

Fair use is whatever a judge and/or a jury says it is.

amelius · on Aug 5, 2021

Source?

NavinF · on Aug 5, 2021

This is incorrect. Pretty much every state of the art model uses copyrighted data. This is considered fair use and it has never been a problem outside of concern trolling.

tinus_hn · on Aug 7, 2021

As a lot of that cc text is automatically generated it seems like you’d just be creating a clone of other software, which might be an intellectual property issue.

hkt · on Aug 5, 2021

Having an open corpus means that researchers building the next thing in voice research - which may or may not follow DeepSpeech - have something to work with. This is enormously important and their change of direction lets a thousand flowers bloom. Meanwhile, their partnership with Nvidia provides a fertile ground to prove the value of the open corpus in action. Nvidia get access to Mozilla's (presumably superior) ability to build said corpus, while Mozilla lay the foundations for others to contribute work in the open. It is a great example of comparative advantage, and a win win choice, IMO.

rasz · on Aug 5, 2021

So in other words we provide data for free to Mozilla, and Mozilla turns around and sells it for millions to Nvidia to fund ... not open source, they killed that so umm ee, to fund ceo salary?

nmstoker · on Aug 5, 2021

You seem to imply that Nvidia are paying for data that is freely available.

Anyone can use the Common Voice data within the terms of the license and NVIDIA contributing towards the continued gathering of data (that will continue to be made publicly available) won't change that.

It's a huge shame that Mozilla didn't continue the DeepSpeech project but Coqui is taking on the mantle there and there are plenty of others working on open source solutions too, all whilst the existence of CV will make a big difference to research, in the academic, commercial and open source spheres.

robbedpeter · on Aug 5, 2021

Coqui is phenomenally good and well done, so this new data should lower the barrier to entry for the represented languages.

danShumway · on Aug 5, 2021

> and sells it

If that was true that would be a profoundly bad purchase for NVidia since the data is already freely licensed and available for anyone to use at no cost.

This is like saying that Epic "bought" Blender when they gave it a development grant, or that Google contributing patches to upstream Linux means they own it now. Mozilla didn't give NVidia any kind of special license, when NVidia contributes data to Common Voice they're doing so under Common Voice's license, not their own.

We want to encourage more companies to treat software and training data as a public commons that is collectively maintained, this is a good thing.

rasz · on Aug 5, 2021

Its the kind of "bad" Nvidia purchase like when they pay game publishers for incorporation of physx/cuda/hairworks/gameworks resulting in

https://techreport.com/news/14707/ubisoft-comments-on-assass...

https://techreport.com/review/21404/crysis-2-tessellation-to...

https://arstechnica.com/gaming/2015/05/amd-says-nvidias-game...

Here it appears they purchased this https://venturebeat.com/2021/04/12/mozilla-winds-down-deepsp...

danShumway · on Aug 5, 2021

This is silly. Common Voice is not adding NVidia-specific features; what would that even look like for a database? There is no comparison to be made between donating resources to an openly licensed database and encouraging developers to optimize their games for proprietary APIs.

And the assumption the shutting down Deep Speech was specifically for NVidia's benefit seems like a fairly large leap to me, given that Deep Speech is already mature, still being developed under Coqui.ai, and surrounded by a wide diversity of other deep learning projects that also aren't controlled by NVidia.

Decreasing barriers of entry for those models and providing raw data is probably the right thing for Mozilla to be focusing on right now. Any team can build a language model, only companies like Mozilla can coordinate mass data collection for those models.

mazoza · on Aug 5, 2021

I know the old speech team continues as Coqui https://github.com/coqui-ai/

tmalsburg2 · on Aug 5, 2021

About their TTS system: "These models provide speech synthesis with ~0.12 real-time factor on a GPU and ~1.02 on a CPU." The quality of the samples is really impressive but, wow, but isn't this computationally too expensive for many applications?

nyanpasu64 · on Aug 5, 2021

>If, for example, it takes 8 hours of computation time to process a recording of duration 2 hours, the real time factor is 4. When the real time factor is 1, the processing is done in real time. It is a hardware-dependent value.

I think real-time factors smaller than 1 are faster than real-time (not slower) and use less than 100% of a resource's computational power to keep up.

tmalsburg2 · on Aug 6, 2021

Not sure what you're quoting because I didn't write that, but

> I think real-time factors smaller than 1 are faster than real-time (not slower) and use less than 100% of a resource's computational power to keep up.

Sure, but who has the necessary GPUs installed? And on CPUs it will apparently take longer to generate speech than the duration of that speech. Unusable for many UIs and it will also drain the batteries of any portable device.

jpetso · on Aug 6, 2021

You're not wrong, but with so many chips incorporating some sort of dedicated "AI" or "tensor" functionality, perhaps the issue will resolve itself for most portable devices in a few years. Plus there's always the option of optimizing a little more and/or abusing other available hardware such as DSP chips to get the real time factor down. Anything over 1 isn't great, but it's not a bad start.

mazoza · on Aug 6, 2021

I means it is faster than real time almost 10x

So it is the contrary

tmalsburg2 · on Aug 6, 2021

It’s 8.3 times faster than real time if you have a beefy GPU, which most devices don’t have. On a desktop CPU it‘s real time and on smartphones worse.

stegrot · on Aug 5, 2021

Deepspeech is still alive in a way, the team founded the company coqui.ai after the Mozilla layoffs and they keep everything open source.

jononor · on Aug 5, 2021

What closed NVidia thing did they adopt? I don't see any evidence of that here.

option · on Aug 5, 2021

https://github.com/NVIDIA/NeMo which is open source, Pytorch based and regularly publishes new models and checkpoints.

Seirdy · on Aug 5, 2021

The source code is under a FLOSS license, but it only works on Nvidia GPUs and uses proprietary Nvidia-specific technologies like CUDA.

It's significantly closer to "nonfree" on the free-nonfree spectrum than it should be, and is another example of the difference between the guiding philosophies behind "free software" and "open source"

yorwba · on Aug 5, 2021

Can't you run it on CPU? And looking at the code, it seems like they're using Numba to JIT their CUDA kernels, so I guess someone could come along and provide a compatibility shim to make the kernels run on a non-CUDA accelerator?

rasz · on Aug 5, 2021

Im sure they signed on adopting "something", otherwise it would be receiving $1.5 million grant for closing open source initiative. $3 million a year lawyer wouldn never be this blatant.

moralestapia · on Aug 5, 2021

Lol, these guys sell themselves for peanuts.