Utilities for preprocessing the Switchboard and WSJ corpora in Python3
- 2020.07.31 : wTIMIT support
Before using the utilities, some requirements must meet first:
- Install Python packages:
tqdm torchaudio - Install sph2pipe executable program
- Convert
.sphfiles inLDC2002S09-Hub5e_00to.wavfiles.python3 sph2wav.py .sph <path to sph2pipe> <path to LDC2002S09-Hub5e_00/english> SWB - Split the eval2000 set by the rules in
hub5e_00.pem.python3 swb_eval_splitter.py <path to LDC2002S09-Hub5e_00/english>
Convert .wv1 files in WSJ0 and WSJ1 to .wav files.
python3 sph2wav.py .wv1 <path to sph2pipe> <path to WSJ0/WSJ1> WSJ
Convert .WAV files in wTIMIT to .wav files.
python3 sph2wav.py .WAV <path to sph2pipe> <path to wTIMIT> wTIMIT