Skip to content

Commit 8aac141

Browse files
hadwaremmmaat
authored andcommitted
Doc force align (#18)
phone level forced-alignment tutorial
1 parent 2ce83ba commit 8aac141

File tree

3 files changed

+93
-0
lines changed

3 files changed

+93
-0
lines changed
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
================
2+
Forced Alignment
3+
================
4+
5+
This tutorial covers the usage of abkhazia to do phone-level forced alignment
6+
on your own corpus of annotated audio files.
7+
8+
Prerequisites
9+
=============
10+
Here's what you need to have before being able to follow this tutorial:
11+
12+
- A set of audio files encoded in 16000kz WAV 16bit PCM on which to run the alignment
13+
- On these audio files, a set of segments corresponding to utterances. For each utterance, you'll
14+
need to have a phonemic transcription (an easy way to get these is by
15+
using [Phonemizer](https://github.com/bootphon/phonemizer) )
16+
17+
It's also recommended (yet optional) to have some kind of reference file where you can identify
18+
the speaker of each of your phonemized utterance.
19+
20+
Corpus format
21+
=============
22+
23+
The corpus format is the same as the one specified in :ref:`abkhazia_format`, two
24+
corpus files having a bit more specific format, namely ``text.txt`` and ``lexicon.txt``.
25+
Here, ``text.txt`` is composed of your phonemic transcription of each utterance::
26+
27+
<utterance-id> <pho1> <pho2> ... <phoN>
28+
29+
30+
and ``lexicon.txt`` is just a "phony" file containg phonemes mapped to themselves::
31+
32+
<pho1> <pho1>
33+
<pho2> <pho2>
34+
<pho3> <pho3>
35+
...
36+
37+
38+
Doing the Forced Alignment
39+
==========================
40+
41+
Once you've gathered all the required files (cited above) in a ``corpus/`` folder (the name is
42+
obviously arbitrary), you'll want to validate the corpus to check that it is conform to Kaldi's
43+
input format. Abkhazia luckily does that for us::
44+
45+
abhkazia validate corpus/
46+
47+
48+
Then, we'll compute the language model (actually here a phonetic model) for your dataset.
49+
Note that even though we set the model-level (option ``-l``) to "word", here it's
50+
still working find since all words are phonemes::
51+
52+
abkhazia language corpus/ -l word -n 3 -v
53+
54+
55+
We'll now extract the MFCC features from the wav files::
56+
57+
abkhazia features mfcc corpus/ --cmvn
58+
59+
60+
Then, using the langage model and the extracted MFCC's, compute a triphone HMM-GMM acoustic model::
61+
62+
abkhazia acoustic monophone -v corpus/ --force --recipe
63+
abkhazia acoustic triphone -v corpus/
64+
65+
If you specified the speaker for each utterance, you can adapt your model per speaker::
66+
67+
abkhazia acoustic triphone-sa -v corpus/
68+
69+
And the, at last, we can compute the forced phonetic aligments::
70+
71+
abkhazia align corpus -a corpus/triphone-sa # if you computed the speaker-adapted triphones
72+
abkhazia align corpus -a corpus/triphone # if you didn't
73+
74+
75+
If everything went right, you should be able to find your alignment in
76+
``corpus/align/alignments.txt``. The file will have the following row structure::
77+
78+
<utt_id> <pho_start> <pho_end> <pho_name> <pho_symbol>
79+
...
80+
81+
**Note that the phoneme's start and end time markers (in seconds) are relative to the utterance
82+
in which they were contained, not to the entire audio file.**

docs/source/abkhazia_format.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,16 @@ Here is an example file with three utterances::
154154
sp109-sentence003 sp109
155155

156156

157+
If you don't have this information, or wish to hide this information to kaldi but still
158+
conform to this dataset format, you should set each utterance to its own unique speaker ID
159+
(as explained [here](http://kaldi-asr.org/doc/data_prep.html)), e.g::
160+
161+
sentence001 sp001
162+
sentence002 sp002
163+
sentence003 sp003
164+
sentence004 sp004
165+
....
166+
157167
4. Transcription
158168
----------------
159169

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ available. List the list of corpora supported in abkhazia with
4444

4545
install
4646
abkhazia_format
47+
abkhazia_force_align
4748
abkhazia_usage
4849
abkhazia_api
4950
license

0 commit comments

Comments
 (0)