Yorùbá Audio

This repo aggregates audio/speech corpora for Yorùbá tasks, similarly to the yoruba-text for text datasets. The corpora may contain aligned text or be purely unlabeled.

The objective is to have a bird's eye view of available Yorùbá audio, and it's metadata and entropy, to inform additional data collection tasks & modeling. For example, if we see a large Broadcast news corpus, we might be interested to train a self-supervised model on a pretext task to generate speech embeddings for use in ASR/TTS work.

Corpora

Name	Size in HH:MM:SS	Transcribed	Segmented in utterances	Aligned	Source
Lagos-NWU	02:45:17	✔️	✔️	✔️	North-West University
OpenSLR86	04:1:31	✔️	✔️	✔️	OpenSLR, Google
Bíbélì Mímọ́ (NIV)	93:38:15	✔️			Biblica Open Bible
Bíbélì Mímọ́ (KJV)		✔️			Bible.is
Colloquial Yorùbá	02:32:29	✔️			Audio files, Textbook
OrisunTV Broadcast News	81:49:29				Youtube
VoxLingua107	94:2:45		✔️		post-filtered from Youtube

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Yorùbá Audio

Corpora

About

Uh oh!

Releases

Packages

License

Niger-Volta-LTI/yoruba-audio

Folders and files

Latest commit

History

Repository files navigation

Yorùbá Audio

Corpora

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages