Skip to content

Helper script to generate free IPA phoneme lexicons from wiktionary.org, currently for German and English.

License

Notifications You must be signed in to change notification settings

bmilde/wiktionary_ipa_phoneme_lexicons

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

wiktionary_ipa_phoneme_lexicons

Helper script to generate free IPA phoneme lexicons from wiktionary.org, currently for German and English.

To run it, download a dump from wikitionary: https://dumps.wikimedia.org/dewiktionary/ (German) or https://dumps.wikimedia.org/enwiktionary/ (English)

e.g. to get a German ipa lexicon from Wiktionary for ASR training, with removed stress markers, run:

git clone https://github.com/bmilde/wiktionary_ipa_phoneme_lexicons
cd wiktionary_ipa_phoneme_lexicons
wget https://dumps.wikimedia.org/dewiktionary/latest/dewiktionary-latest-pages-articles-multistream.xml.bz2
bunzip2 dewiktionary-latest-pages-articles-multistream.xml.bz2
python3 make_lex.py -f dewiktionary-latest-pages-articles-multistream.xml -o de_ipa_lexicon.txt --remove-stress

The generated German phoneme lexicon should now be in de_ipa_lexicon.txt

About

Helper script to generate free IPA phoneme lexicons from wiktionary.org, currently for German and English.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages