Mozilla Common Voice 22nd dataset is now available (commonvoice.mozilla.org)

submitted 1 day ago by [email protected] to c/[email protected]

0 comments fedilink hide all child comments

cross-posted from: https://lemmy.ca/post/48124586

From their newsletter:

We’re so excited to share that the 22nd dataset release for Common Voice is now available for download.

Common Voice 22.0 has an additional 281 hours of speech data, bringing the total number of hours to 33,815. This release has also seen a jump in 296 newly validated hours, with a total of 22,640 validated hours of clips. This release welcomes the addition of Aromanian (rup), Tajik (tg), and Venda/Tshivenda (ve) languages.

Aromanian is spoken by around 210,000 people in the Balkans, while Tajik is a language closely related to Persian spoken in Tajikistan and Uzbekistan by over 10 million people. Venda / Tshivenda is spoken by over 2 million people as a first or other language in South Africa and Zimbabwe.

This brings the total number of languages available in this Scripted Speech release to 137.

For those unfamiliar:

Common Voice is a crowdsourcing project started by Mozilla to create a free and open speech corpus. The project is supported by volunteers who record sample sentences with a microphone and review recordings of other users. The transcribed sentences are collected in a voice database available under the public domain license CC0.[1] This license ensures that developers can use the database for voice-to-text and text-to-voice applications without restrictions or costs.

no comments (yet)

sorted by: hot top new old

there doesn't seem to be anything here

this post was submitted on 17 Jul 2025

44 points (100.0% liked)

Open Source

38967 readers

513 users here now

All about open source! Feel free to ask questions, and share news, and interesting stuff!

Useful Links

Rules

Posts must be relevant to the open source ideology
No NSFW content
No hate speech, bigotry, etc

Related Communities

Community icon from opensource.org, but we are not affiliated with them.

founded 5 years ago

MODERATORS

[email protected]