-
Notifications
You must be signed in to change notification settings - Fork 0
Language Specifications
The language codes used within ULCA datasets is the recommended and standardised nomenclature used to specify languages i.e. ISO 639. Each language is assigned a two-letter (639-1) and three-letter (639-2 and 639-3) lowercase abbreviation as per the ISO standardisation.
Refer to these links for further details :
The below table consists of the ISO 639 Code used within ULCA and the language label for reference.
The source of truth for this list is https://github.com/ULCA-IN/ulca/blob/master/specs/common-schemas.yml#SupportedLanguages
"mixed" and "unknown" are the only exceptions - they are added to simply the contribution & search.
📔 Any new language support (to be added to the spec) can be requested to the ULCA team.
| Code | Label | ISO Standard |
|---|---|---|
| en | English | ISO 639-1 |
| hi | Hindi | ISO 639-1 |
| mr | Marathi | ISO 639-1 |
| ta | Tamil | ISO 639-1 |
| te | Telugu | ISO 639-1 |
| kn | Kannada | ISO 639-1 |
| gu | Gujarati | ISO 639-1 |
| pa | Punjabi | ISO 639-1 |
| bn | Bengali | ISO 639-1 |
| ml | Malayalam | ISO 639-1 |
| as | Assamese | ISO 639-1 |
| ks | Kashmiri | ISO 639-1 |
| ne | Nepali | ISO 639-1 |
| or | Odia | ISO 639-1 |
| sd | Sindhi | ISO 639-1 |
| si | Sinhala | ISO 639-1 |
| ur | Urdu | ISO 639-1 |
| sa | Sanskrit | ISO 639-1 |
| brx | Bodo | ISO 639-3 |
| doi | Dogri | ISO 639-3 |
| kok | Konkani | ISO 639-3 |
| mai | Maithili | ISO 639-3 |
| mni | Manipuri | ISO 639-3 |
| sat | Santali | ISO 639-3 |
| lus | Lushai | ISO 639-3 |
| njz | Ngungwel | ISO 639-3 |
| pnr | Panim | ISO 639-3 |
| kha | Khasi | ISO 639-3 |
| grt | Garo | ISO 639-3 |
| bho | Bhojpuri | ISO 639-3 |
| raj | Rajasthani | ISO 639-3 |
| gom | Goan | ISO 639-3 |
| awa | Awadhi | ISO 639-3 |
| hne | Chhattisgarhi | ISO 639-3 |
| mag | Magahi | ISO 639-3 |
| mwr | Marwari | ISO 639-3 |
| sjp | Surjapuri | ISO 639-3 |
| anp | Angika | ISO 639-3 |
| gbm | Garhwali | ISO 639-3 |
| tcy | Tulu | ISO 639-3 |
| hlb | Halbi | ISO 639-3 |
| bih | Bihari | ISO 639-2/5 |
| bns | Bundeli | ISO 639-3 |
| unknown | Unknown | NA |
| mixed | Mixed | NA |
Please reach out to us via Discussions forum if you wish to improvise the documentation or contribute to ULCA