Manuscript Annotation and Analysis

Overview

This page describes DAILP's internal annotation process, with a focus on internal goals and workflows. To see how this fits with DAILP's goals for community involvement, see our Community Translation page.

Manuscript Images and English Translations

Our internal annotation process bridges an image of a manuscript with a free English translation provided by speakers of both Cherokee and English. Manuscript images are generally photographs or scans of a handwritten Cherokee document shared with DAILP by libraries. Free translations are created by community translators from the original manuscript image and shared with the DAILP team. Metadata for the document is also recorded, including:

Document ID
Genre
Source text name
Document title
Page number within source text
Number of pages
Translations, image IDs
Estimated document creation date
Contributors (translators and annotators)
Source

Creating an Interlinear Gloss

Once DAILP has access to a manuscript image and a free translation for a document, the annotation process can begin. Minimally, an annotation for a word should follow the template below:

Syllabary Layer
Simple Phonetics
Word Parts (Morpheme break)
Word Part Meanings (Morpheme gloss)
Word Translation

Maximally annotated forms can also have a romanized form, a detailed sound form (phonemic layer), and commentary:

Syllabary Layer
Romanization
Simple Phonetics
Detailed Sound (Phonemic Layer)
Word Parts (Morpheme break)
Word Part Meanings (Morpheme gloss)
Word Translation
Commentary

Currently, all annotation data is input and stored on Google Drive, then migrated to the DAILP database.

Syllabary Transcription

This is typically the first layer to be entered in the annotation process. During the development of this layer, annotators should enter typed Cherokee syllabary characters into the analysis spreadsheet, matching the handwritten text of the document as close as possible. At present, all DAILP documents were written in the syllabary to begin with, but documents with a romanization of Cherokee as their source could possibly be transcribed in this layer as well.

Romanization

This layer holds spellings of Cherokee words written using the Latin alphabet as they appear with the source material. Right now, this layer is only used on a few documents, and generally is left blank otherwise.

Simple Phonetics Transliteration

This is typically the second layer added in the annotation process. This layer takes the transcribed syllabary layer and transforms it into a pedagogical orthography for Cherokee, called simple phonetics, and largely based on the sound values of the syllabary.

Phonemic Layer

This layer contains more detailed pronunciation information about a word, including tone and vowel length. Currently this layer is not used in any analyzed documents.

Word Parts (Morpheme Break) Layer

The word parts layer is one of the deepest levels of annotation and analysis we currently provide. Creating this layer involves trying to break words into meaningful parts (also called morphemes) based on entries in grammars, dictionaries, and other resource materials.

For example, the word ᎠᏂᏣᎳᎩ (anijalagi) can be broken into two parts with meaning: anii and jalagi, each with their own meaning. The meaning of these parts are placed in the word parts meaning layer, also called the morpheme gloss layer. Putting these parts together separated by hyphens gives us the final word parts layer: anii-jalagi

Word Parts Meaning (Morpheme Gloss) Layer

This layer is built in parallel with the word parts layer, also called the morpheme break layer. In this layer, a standard set of labels is applied to the segments from the word parts layer. Using our example from the word parts section:

Syllabary: ᎠᏂᏣᎳᎩ
Simple Phonetics: anijalagi
Word parts: anii-jalagi

Since we have word parts, we can try to assign meaning to the parts.

Word parts: anii-jalagi
Meanings: they (pl.)-Cherokee

These parts and their meaning are useful to Cherokee language learners and help annotators and readers better understand the meaning of a single word in a document, shown in the word-by-word translation.

Word-by-Word Translation

⚠️ This section is currently under construction! ⚠️ The word-by-word translation bridges the gap between the free translation and the word-for-word analysis.

Commentary

⚠️ This section is currently under construction! ⚠️

Storage and Archiving

⚠️ This section is currently under construction! ⚠️

Manuscript Annotation and Analysis

Overview

Manuscript Images and English Translations

Creating an Interlinear Gloss

Syllabary Transcription

Romanization

Simple Phonetics Transliteration

Phonemic Layer

Word Parts (Morpheme Break) Layer

Word Parts Meaning (Morpheme Gloss) Layer

Word-by-Word Translation

Commentary

Storage and Archiving

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Table of Contents

Home

Community-Based Design

Annotation and Analysis

Technical Design

Development Processes

Clone this wiki locally