Manuscript Annotation and Analysis

Overview

This page describes DAILP's internal annotation process, with a focus on internal goals and workflows. To see how this fits with DAILP's goals for community involvement, see our Community Translation page on our website.

Manuscript Images and English Translations

Our internal annotation process bridges an image of a manuscript with a free English translation provided by speakers of both Cherokee and English. Manuscript images are generally photographs or scans of a handwritten Cherokee document shared with DAILP by libraries. Free translations are created by community translators from the original manuscript image and shared with the DAILP team. Metadata for the document is also recorded, including:

Document ID
Genre
Source text name
Document title
Page number within source text
Number of pages
Translations, image IDs
Estimated document creation date
Contributors (translators and annotators)
Source

Creating an Interlinear Gloss

Once DAILP has access to a manuscript image and a free translation for a document, the annotation process can begin. An annotation for a word should follow the template below:

Syllabary Layer
Simple Phonetics
Word Parts (Morpheme break)
Word Part Meanings (Morpheme gloss)
Word Translation

Additional optional information about a romanized form, a detailed sound form (phonemic layer), and commentary can be added to a word in a document. With these additions, a word can include all of the following information (optional items are marked with a star *):

Syllabary Layer
* Romanization
Simple Phonetics
* Detailed Sound (Phonemic Layer)
Word Parts (Morpheme break)
Word Part Meanings (Morpheme gloss)
Word Translation
* Commentary

Currently, all annotation data is input and stored on Google Drive, then migrated to the DAILP database.

Syllabary Transcription

This is typically the first layer to be entered in the annotation process. During the development of this layer, annotators should enter typed Cherokee syllabary characters into the analysis spreadsheet, matching the handwritten text of the document as close as possible. At present, all DAILP documents were written in the syllabary to begin with, but documents with a romanization of Cherokee as their source could possibly be transcribed in this layer as well.

Romanization

This layer holds spellings of Cherokee words written using the Latin alphabet as they appear with the source material. Right now, this layer is only used on a few documents, and generally is left blank otherwise.

Simple Phonetics Transliteration

This is typically the second layer added in the annotation process. This layer takes the transcribed syllabary layer and transforms it into a pedagogical orthography for Cherokee, called simple phonetics, and largely based on the sound values of the syllabary.

Phonemic Layer

This layer contains more detailed pronunciation information about a word, including tone and vowel length. Currently this layer is not used in any analyzed documents.

Word Parts (Morpheme Break) Layer

The word parts layer is one of the deepest levels of annotation and analysis we currently provide. Creating this layer involves trying to break words into meaningful parts (also called morphemes) based on entries in grammars, dictionaries, and other resource materials.

For example, the word ᎠᏂᏣᎳᎩ (anijalagi) can be broken into two parts with meaning: anii and jalagi, each with their own meaning. The meaning of these parts are placed in the word parts meaning layer, also called the morpheme gloss layer. Putting these parts together separated by hyphens gives us the final word parts layer: anii-jalagi

Word Parts Meaning (Morpheme Gloss) Layer

This layer is built in parallel with the word parts layer, also called the morpheme break layer. In this layer, a standard set of labels is applied to the segments from the word parts layer. Using our example from the word parts section:

Syllabary: ᎠᏂᏣᎳᎩ
Simple Phonetics: anijalagi
Word parts: anii-jalagi

Since we have word parts, we can try to assign meaning to the parts.

Word parts: anii-jalagi
Meanings: they (pl.)-Cherokee

These parts and their meaning are useful to Cherokee language learners and help annotators and readers better understand the meaning of a single word in a document, shown in the word-by-word translation.

Word-by-Word Translation

The word-by-word translation bridges the gap between the free translation and the word-for-word analysis. This layer builds on information in the free translation and words available in Cherokee dictionaries, word lists, and grammars. Additionally, these meanings can be adjusted based on the analysis done in the word parts sections. In the example used in the previous sections, the word meaning may look like this:

Word parts: anii-jalagi
Parts Meaning: they (pl.)-Cherokee
Word Translation: Cherokees

In this case, the word-for-word translation is very close to the word parts meaning since the two both highlight the meaning of a full word. The main difference is the focus of each analysis, with the word-for-word translation communicating the main idea of the Cherokee word and the parts showing how the word-for-word meaning is built by different pieces.

Word meanings can be found in a number of standard reference sources, although DAILP gives a degree of preference to Durbin Feeling's Cherokee–English Dictionary. A list of all grammar and vocabulary resources used in annotation and analysis can be found on the DAILP website.

Commentary

The commentary layer currently includes information about what language resource(s) aided in creating the word-for-word meaning and word parts meaning for a word. The commentary layer can also show a note about a word, which appears as an information bubble on the DAILP site.

In the near future, this commentary will be expanded, allowing Cherokees to add their own stories, questions, and comments about different parts of a document.

Storage and Archiving

As of now, annotations are made in a Google Sheets template and stored in Google Drive. From here, the data can be migrated to the DAILP database and exposed via a GraphQL endpoint, which provides data for the DAILP website.

Manuscript Annotation and Analysis

Overview

Manuscript Images and English Translations

Creating an Interlinear Gloss

Syllabary Transcription

Romanization

Simple Phonetics Transliteration

Phonemic Layer

Word Parts (Morpheme Break) Layer

Word Parts Meaning (Morpheme Gloss) Layer

Word-by-Word Translation

Commentary

Storage and Archiving

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Table of Contents

Home

Community-Based Design

Annotation and Analysis

Technical Design

Development Processes

Clone this wiki locally