Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm guessing that of the 4,600 new hours of speech, maybe 4,100 of those hours are of men's voices and 500 hours are of women's voices, yeah?


Thanks so much for sharing your comment. Gender equality in participation in Common Voice, is something we really want to improve and champion. As part of the Kiswahili Language community engagement, our team are implementing a gender action plan that includes both participation and use cases for the dataset. We hope to consult, adapt and replicate gender inclusion that has been done by community members and gender action plan to improve representation and involvement of all genders in open source projects such as Common Voice.


To be fair not sure that's the best guess :) there seem to be more female voices than men to me. Anyhow, I'd wager there's at least a 50:50 mix.


I probably should've looked this up before I decided to comment but at least according to this:

https://commonvoice.mozilla.org/en/datasets

The ratio of male to female tagged voices in the English dataset is 45 percent male to 15 percent female. (The remaining 40 percent is untagged.) Odds are good that the ratio is closer to 75 25 than 50 50, at least by hours of recorded audio.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: