Language Identification via Phoneme encoding

We propose to distinguish languages by maximum likelihood via a Markov modeling of each language of interest, after encoding the raw text into phonemme encodings. Phonemes are broken down into consonant types and vowels.

To run:

Simply run predict.py. New files can be added either to the train or test folders, but the model is currently limited to languages which employ variations on the Latin alphabet.

Requirements:

pandas numpy unicodedata

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Language_Identification		Language_Identification
Language_Identification(clean).ipynb		Language_Identification(clean).ipynb
README.md		README.md
predict.py		predict.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Language Identification via Phoneme encoding

To run:

Requirements:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

joecomerisnotavailable/phoneme_based_language_id

Folders and files

Latest commit

History

Repository files navigation

Language Identification via Phoneme encoding

To run:

Requirements:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages