Skip to content

cceh/leiden-js-idp-test-data

Repository files navigation

Leiden IDP Test Data for leiden-js

This repository contains edition and translation data from the Integrating Digital Papyrology project (IDP) used for testing the leiden-js parsers and transformers. The data has been processed through a recent version of XSugar with the Leiden grammars to convert XML to Leiden+/Leiden translation format and back to XML.

This ensures we're working with up-to-date data that matches what the current XSugar grammar would produce, since some files in the IDP dataset don't match the output that the current IDP XSugar processor would generate from the same input. This repository's data is used alongside the IDP test suite in leiden-js to verify compatibility.

The roundtrip data is generated using create-idp-roundtrips.ts, which:

  1. Reads XML files from source directories (idp.data/DDB_EpiDoc_XML or idp.data/HGV_trans_EpiDoc)
  2. Extracts content matching the configured selector (div[type="edition"] or body)
  3. Uses XSugar (in Docker) to convert XML to Leiden+ or Leiden Translation
  4. Converts the Leiden back to XML and saves it in the corresponding roundtrips directory

Re-generate the data

Regenerate the test data when:

  • The IDP data has been updated
  • There's a new version of the IDP XSugar processor

Initial Setup

First clone with the --recursive flag to include the IDP data submodule:

git clone --recursive https://github.com/cceh/leiden-js-idp-test-data.git
cd leiden-js-idp-test-data
npm install

To update the IDP data (if needed):

git submodule update --init --depth 1 ./idp.data

Generate the roundtrips

Generate all data at once:

npm run generate

This will start XSugar, process edition and translation files, then shut down the service.

Or step-by-step

Run each step manually:

  1. Start XSugar:

     docker-compose up -d
  2. Generate edition roundtrips:

    tsx create-idp-roundtrips.ts edition
  3. Generate translation roundtrips:

    tsx create-idp-roundtrips.ts translation
  4. Stop XSugar:

    docker-compose down

Licensing

About

IDP test data for leiden-js

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published