Skip to content

Data Migration

jae-mess edited this page Jul 22, 2025 · 5 revisions

The code for data migration lives in the migration/ directory.

At the moment, no data is born in our MongoDB database. All data is imported from a myriad of Google Sheets using the Sheets REST API. The migration process for annotated texts consists of the following steps.

  1. Retrieve the contents of the Annotated Texts Index to eventually process each row into an AnnotatedDoc.
  2. For each row of the index, retrieve the contents of the "Annotation Sheet" column cell. The annotation sheet must consist of one or more pages named exactly like so Page 1, Page 2, etc. and two other pages called Metadata and References.
  3. Ingest the "Metadata" page into a DocumentMetadata structure. Correlate each "Source Document Image" cell with the Page X sheet of the same index.
  4. Fill in the segments field of the AnnotatedDoc with the Page X sheet contents concatenated together, by converting each set of annotation rows into an AnnotatedForm.
  5. Push the assembled AnnotatedDoc to the database by running the updateDocument mutation on the GraphQL server.

Clone this wiki locally