Align Open.Bible data
| Language | Passing | Failing | Unknown | Notes | Aligned Sample |
|---|---|---|---|---|---|
| Yoruba | 💚 | Psalm 119 | |||
| Ewe | 💚 | Psalm 119 | |||
| Lingala | 💚 | Psalm 119 | |||
| Asante Twi | 💚 | ||||
| Akuapem Twi | 💚 | ||||
| Chichewa | ❤️🩹 | Passing with bad alignments | Psalm 119 | ||
| Hausa | 💔 | ||||
| Luo | 💔 | ||||
| Luganda | 💔 | ||||
| Kikuyu | 💔 | ||||
| Arabic | ❓ | ||||
| Kurdi Sorani | ❓ | ||||
| Polish | ❓ | ||||
| Vietnamese | ❓ |
$ git clone https://github.com/coqui-ai/open-bible-scripts.git
The first alignment approach is to use MFA to align and train a new acoustic model from stratch.
You need to install a couple things on your own:
Use the language name as defined in open-bible-scripts/data/*.txt. Use the language code as expected by covo.
E.g., for Yoruba use yoruba and yo, for Ewe use ewe and ee, for Luganda luganda and lg, and so on.
$ cd open-bible-scripts
open-bible-scripts$ ./run-pre-alignment.sh yoruba yo
Generate alignments with mfa train
$ docker run -it --mount "type=bind,src=/home/ubuntu/open-bible-scripts,dst=/mnt" mmcauliffe/montreal-forced-aligner
(base) root@d8095c794d5f:/# conda activate aligner
(aligner) root@d8095c794d5f:/# mfa train --clean --num_jobs `nproc` --temp_directory /mnt/yoruba/data/mfa-tmp-dir --config_path /mnt/MFA_CONFIG /mnt/yoruba/data /mnt/yoruba/dict.txt /mnt/yoruba/data/mfa-output &> /mnt/yoruba/data/LOG &
# At this point, alignment will take a while,
# so you might want to detach from the docker container
# with `Ctrl-P followed by Ctrl-Q`
Use the language name as defined in open-bible-scripts/data/*.txt.
E.g., for Yoruba use yoruba, for Ewe use ewe, for Luganda luganda, and so on.
$ cd open-bible-scripts
open-bible-scripts$ ./run-post-alignment.sh yoruba yo
This works for only Lingala, Akuapem Twi, and Asante Twi.
Install sox on your OS. See linux installation below
sudo apt-get install sox
sudo apt-get install libsox-fmt-mp3
sox --version
python3 -m venv venv
source venv/bin/activate
pip install -U pip
pip install pandasExecute the run-biblica-splits-*.sh script from the root dir, for example with Lingala:
./run-biblica-splits-lingala.sh