This repository demonstrates the procedure and utilities used to automatically process large amounts of speech data in order to create a corpus which can be used to train models for speech processing, for example in automatic speech recognition.
This repository is used for the new iteration of the project starting 2025. For old data, switch to old-2022 branch.
We are developing the documentation of the ParlaSpeech dataset collection here: https://clarinsi.github.io/parlaspeech.
Learn how to explore this corpus with the concordancer: https://clarinsi.github.io/parlaspeech/concordancer/
- Nikola Ljubešić [email protected]
- Peter Rupnik [email protected]
- Ivan Porupski [email protected]
- Danijel Koržinek [email protected]
- Taja Kuzman Pungeršek [email protected]