Leiden IDP Test Data for leiden-js
This repository contains edition and translation data from the Integrating Digital Papyrology project (IDP) used for testing the leiden-js parsers and transformers. The data has been processed through a recent version of XSugar with the Leiden grammars to convert XML to Leiden+/Leiden translation format and back to XML.
This ensures we're working with up-to-date data that matches what the current XSugar grammar would produce, since some files in the IDP dataset don't match the output that the current IDP XSugar processor would generate from the same input. This repository's data is used alongside the IDP test suite in leiden-js to verify compatibility.
The roundtrip data is generated using create-idp-roundtrips.ts
, which:
- Reads XML files from source directories (
idp.data/DDB_EpiDoc_XML
oridp.data/HGV_trans_EpiDoc
) - Extracts content matching the configured selector (
div[type="edition"]
orbody
) - Uses XSugar (in Docker) to convert XML to Leiden+ or Leiden Translation
- Converts the Leiden back to XML and saves it in the corresponding
roundtrips
directory
Regenerate the test data when:
- The IDP data has been updated
- There's a new version of the IDP XSugar processor
First clone with the --recursive
flag to include the IDP data submodule:
git clone --recursive https://github.com/cceh/leiden-js-idp-test-data.git
cd leiden-js-idp-test-data
npm install
To update the IDP data (if needed):
git submodule update --init --depth 1 ./idp.data
Generate all data at once:
npm run generate
This will start XSugar, process edition and translation files, then shut down the service.
Run each step manually:
-
Start XSugar:
docker-compose up -d
-
Generate edition roundtrips:
tsx create-idp-roundtrips.ts edition
-
Generate translation roundtrips:
tsx create-idp-roundtrips.ts translation
-
Stop XSugar:
docker-compose down
- IDP data and derived files in
roundtrips
: CC-BY 3.0 (see roundtrips/LICENSE) - Everything else: MIT License (see LICENSE)