Skip to content

clarinsi/parlaspeech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ParlaSpeech data preparation procedure

This repository demonstrates the procedure and utilities used to automatically process large amounts of speech data in order to create a corpus which can be used to train models for speech processing, for example in automatic speech recognition.

Old repository

This repository is used for the new iteration of the project starting 2025. For old data, switch to old-2022 branch.

New documentation

We are developing the documentation of the ParlaSpeech dataset collection here: https://clarinsi.github.io/parlaspeech.

Concordancer user guide

Learn how to explore this corpus with the concordancer: https://clarinsi.github.io/parlaspeech/concordancer/

Authors

About

Code for bootstrapping ASR datasets from parliamentary recordings and transcripts

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •