ParlaSpeech data preparation procedure

This repository demonstrates the procedure and utilities used to automatically process large amounts of speech data in order to create a corpus which can be used to train models for speech processing, for example in automatic speech recognition.

Old repository

This repository is used for the new iteration of the project starting 2025. For old data, switch to old-2022 branch.

New documentation

We are developing the documentation of the ParlaSpeech dataset collection here: https://clarinsi.github.io/parlaspeech.

Concordancer user guide

Learn how to explore this corpus with the concordancer: https://clarinsi.github.io/parlaspeech/concordancer/

Authors

Nikola Ljubešić [email protected]
Peter Rupnik [email protected]
Ivan Porupski [email protected]
Danijel Koržinek [email protected]
Taja Kuzman Pungeršek [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ParlaSpeech data preparation procedure

Old repository

New documentation

Concordancer user guide

Authors

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 4

Uh oh!

License

clarinsi/parlaspeech

Folders and files

Latest commit

History

Repository files navigation

ParlaSpeech data preparation procedure

Old repository

New documentation

Concordancer user guide

Authors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 4

Uh oh!

Packages