This version of MULAN repository for extracting attribute features from raw audio files from SRE08 dataset. These articulatory attribute features (manner and place) are high-level speech descriptive features. ...
If you publish any results using this code, refere with this citation:
@inproceedings{DBLP:conf/slt/KukanovHSL16,
author = {Ivan Kukanov and Ville Hautam{\"{a}}ki and
Sabato Marco Siniscalchi and Kehuang Li},
title = {Deep learning with maximal figure-of-merit cost to advance multi-label
speech attribute detection},
booktitle = {2016 {IEEE} Spoken Language Technology Workshop, {SLT} 2016, San Diego,
CA, USA, December 13-16, 2016},
pages = {489--495},
year = {2016},
doi = {10.1109/SLT.2016.7846308}
}
- fix path to your installed Kaldi toolkit in
path.sh - be sure that all your bash files are runnable, fixing: run from the project folder
chmod -R +x ./ - change input folder
data_dirinrun.sh- root folder with audio files,steps/data_prep.shscript will search all files with extensionpcmin all subfolders - change output folder
out_dirinrun.sh, audio lists, fbank features and result attribute scores will be saved there
Manner attribute scores will be saved in $out_dir/res/manner/scores.txt and place attributes in $out_dir/res/place/scores.txt in the next format:
utterance_id [ columns with attributes scores per each frame]
Columns in scores.txt correspond to the next type of attributes (you can find in data/dict/):
manner: [ fricative glides nasal other silence stop voiced vowel ]
place: [ coronal dental glottal high labial low mid other palatal silence velar ]