Skip to content

Language Specific Limitations

jae-mess edited this page Jul 16, 2025 · 2 revisions

Since the project began with only Cherokee language data, our current design has some limitations that we hope to resolve in future iterations and as we expand to other languages.

  • Currently, our data ingestion process is specific to the Cherokee language data that we have access to.
  • When querying for the shape of a morpheme, you can specify an orthography, but there is a static set of choices specific to Cherokee (TAOC, CRG, Learner).
  • There is no tagging of orthography type in the source layer of a form. We tend to assume that this is the Cherokee syllabary or other writing system based on the Latin alphabet.
  • We currently exclude the phonemic representation from the front-end, since it is used for very specific types of linguistic analysis and is, therefore, outside of our scope.
  • Functional morpheme tags are identified from a global list, but different languages might use the same string for different meanings.
  • We assume that our audience is already familiar with and likely fluent in English, and development is focused around this assumption.

After 2024, the following limitations were addressed and resolved:

  • Different language varieties were not indicated. The project worked with the Oklahoma variety, with citizens from the United Keetoowah Band of Cherokee Indians providing audio recordings. The project received support and feedback from community members of all three federally recognized Cherokee tribes.

After 2025, the project hopes to achieve:

  • Capture and support multiple languages in the metadata without assuming the primacy of English
  • Capture and support detailed phonemic representations using systems preferred by tribal communities.
  • Enable detailed phonemic representations for linguistic analysis purposes, which is outside the current scope of the project.

Clone this wiki locally