-
Notifications
You must be signed in to change notification settings - Fork 5
Manuscript Annotation and Analysis
This page describes DAILP's internal annotation process, with a focus on internal goals and workflows. To see how this fits with DAILP's goals for community involvement, see our Community Translation page.
Our internal annotation process bridges an image of a manuscript with a free English translation provided by speakers of both Cherokee and English. Manuscript images are generally photographs or scans of a handwritten Cherokee document shared with DAILP by libraries. Free translations are created by community translators from the original manuscript image and shared with the DAILP team. Metadata for the document is also recorded, including:
Document ID
Genre
Source text name
Document title
Page number within source text
Number of pages
Translations, image IDs
Estimated document creation date
Contributors (translators and annotators)
Source
Once DAILP has access to a manuscript image and a free translation for a document, the annotation process can begin. Minimally, an annotation for a word should follow the template below:
Syllabary Layer
Simple Phonetics
Word Parts (Morpheme break)
Word Part Meanings (Morpheme gloss)
Word Translation
Maximally annotated forms can also have a romanized form, a detailed sound form (phonemic layer), and commentary:
Syllabary Layer
Romanization
Simple Phonetics
Detailed Sound (Phonemic Layer)
Word Parts (Morpheme break)
Word Part Meanings (Morpheme gloss)
Word Translation
Commentary
Currently, all annotation data is input and stored on Google Drive, then migrated to the DAILP database.
This is typically the first layer to be entered in the annotation process. During the development of this layer, annotators should enter typed Cherokee syllabary characters into the analysis spreadsheet, matching the handwritten text of the document as close as possible. At present, all DAILP documents were written in the syllabary to begin with, but documents with a romanization of Cherokee as their source could possibly be transcribed in this layer as well.
This layer holds spellings of Cherokee words written using the Latin alphabet as they appear with the source material. Right now, this layer is only used on a few documents, and generally is left blank otherwise.
This is typically the second layer added in the annotation process. This layer takes the transcribed syllabary layer and transforms it into a pedagogical orthography for Cherokee, called simple phonetics, and largely based on the sound values of the syllabary.
This layer contains more detailed pronunciation information about a word, including tone and vowel length. Currently this layer is not used in any analyzed documents.
The word parts layer is one of the deepest levels of annotation and analysis we currently provide. Creating this layer involves trying to break words into meaningful parts (also called morphemes) based on entries in grammars, dictionaries, and other resource materials.
For example, the word ᎠᏂᏣᎳᎩ (anijalagi) can be broken into two parts with meaning: anii and jalagi, each with their own meaning. The meaning of these parts are placed in the word parts meaning layer, also called the morpheme gloss layer. Putting these parts together separated by hyphens gives us the final word parts layer: anii-jalagi
This layer is built in parallel with the word parts layer, also called the morpheme break layer. In this layer, a standard set of labels is applied to the segments from the word parts layer. Using our example from the word parts section:
Syllabary: ᎠᏂᏣᎳᎩ
Simple Phonetics: anijalagi
Word parts: anii-jalagi
Since we have word parts, we can try to assign meaning to the parts.
Word parts: anii-jalagi
Meanings: they (pl.)-Cherokee
These parts and their meaning are useful to Cherokee language learners and help annotators and readers better understand the meaning of a single word in a document, shown in the word-by-word translation.
⚠️ This section is currently under construction!⚠️ The word-by-word translation bridges the gap between the free translation and the word-for-word analysis.
⚠️ This section is currently under construction!⚠️
⚠️ This section is currently under construction!⚠️
- CARE Principles
- Collective Decision-Making Process
- Data Resilience
- Culturally-Sensitive Information
- UX Design
- Metadata
- User Contributed Audio
- Audio Data Process
- Manuscript Annotation and Analysis
- Language Specific Limitations
- Annotation and Analysis (Before 2024)
- Code Standards
- AWS Diagnostics and Triage Guide
- Cloud Architecture
- Development Environments
- Data Representation
- Data Migration
- User Groups and Roles
- Wordpress Content
- Web Design & Accessibility