Skip to content

Document Level Analysis

jae-mess edited this page Jun 19, 2025 · 2 revisions

Overview

This page provides an outline of how document-level information is organized, annotated, and analyzed by our translation team post-2024.

Document Level Annotation

Documents are any primary source of archival material the translation teams have identified to include in a collection. These may come in any modality, from handwritten letters and stories to audio files and more. Documents are part of edited collections, which are community-designed groupings of sources at a higher level. Documents are selected for translation by community groups.

Free Translation

Document-level translation provides the overall meaning of a passage (free translation in English), as opposed to word-for-word translation (direct translation). Free translation allows for the description of cultural information, idiomatic or poetic expressions, and contextual information in the target language (English).

Document Audio

Each document is annotated with a recording that covers the entirety of the source material. Prior to 2024, this audio was recorded in Zoom. After 2024, the process of adding audio is supported by the DAILP translation interface. Audio is later segmented to provide word-level data.

Source Images

DAILP sources digital copies of the original source in the Original Text section of a document’s page. Archival images are drawn into the database using the archive's DOI or URL. A copy of the image can also be ingested into the database with permission from the archive. Project leaders establish collaborative relationships with archives to source images.

Metadata

Each document contains metadata that provides basic information about the document, contributor names and roles, and source attributions. View this page for a more detailed view of what is included in a document’s metadata.

Clone this wiki locally