A static publishing platform for Syriac text translations, part of the Syriaca.org digital humanities project. Translations are encoded in TEI XML, transformed to HTML and JSON, and served as a static site via AWS CloudFront.
graph TB
subgraph GitHub["GitHub (srophe)"]
CodeRepo["translations-app"]
DataRepo["translations-data"]
GHA["GitHub Actions"]
end
subgraph Pipeline["Build Pipeline"]
TEI2JSON["tei2json.py"]
TEI2HTML["tei2html.xsl (Saxon)"]
end
subgraph AWS["AWS Cloud"]
S3["S3 Bucket\ngaddel-translations-site"]
CF["CloudFront\nDistribution"]
OAC["Origin Access Control"]
end
Users["End Users"]
DataRepo -->|TEI XML| GHA
CodeRepo -->|XSLT + Python| GHA
GHA --> TEI2JSON
GHA --> TEI2HTML
TEI2JSON -->|combined.json + bulk_data.json| S3
TEI2HTML -->|ms/*.html| S3
CodeRepo -->|Static assets| S3
OAC --> S3
CF --> OAC
Users -->|HTTPS| CF
translations-app/
├── .github/workflows/
│ ├── codeToAWS.yml # Deploy static assets to S3
│ └── dataToAWS.yml # Process TEI → JSON + HTML, upload to S3
├── data/ # Pre-built HTML pages for translations
├── documentation/schema/ # TEI schema (RelaxNG)
├── infrastructure/
│ └── cloudformation.yml # AWS stack (S3, CloudFront, OAC, IAM)
├── json/
│ └── combined.json # Aggregated search data for client-side search
├── resources/
│ ├── bootstrap/ # Bootstrap CSS/JS
│ ├── components/ # Shared HTML components
│ ├── css/ # Stylesheets
│ ├── jquery-ui/ # jQuery UI
│ ├── js/ # Client-side JS (navbar, search)
│ └── keyboard/ # Syriac virtual keyboard
├── siteGenerator/
│ ├── components/ # Page templates, repo-config.xml
│ └── xsl/ # XSLT stylesheets for TEI transformation
├── index.html # Homepage
└── tei2json.py # Python TEI → JSON extraction script
The dataToAWS.yml workflow runs on push to the code repo's main branch and by workflow dispatch:
- Checks out both
translations-app(code) andtranslations-data(TEI XML) - Runs
tei2json.pyto extract searchable JSON from TEI files - Runs Saxon XSLT (
tei2html.xsl) to convert TEI XML → HTML pages in parallel - Uploads JSON to S3 for search indexing
- Syncs HTML pages (
ms/) and TEI XML (tei/) to S3 - Invalidates CloudFront cache
- Commits updated
combined.jsonback to the repo
The tei2json.py script extracts searchable fields from TEI XML for client-side search:
# Single file
python3 tei2json.py path/to/file.xml --pretty
# Directory → individual JSON files
python3 tei2json.py path/to/tei/ -o json_output/
# Directory → single combined.json for client-side search
python3 tei2json.py path/to/tei/ --combined json/combined.jsonOutput fields:
title— array of titles from titleStmtdisplayTitleEnglish— formatted display titleauthor— author namestranslator— translator namesidno— URI identifiertype— document typeseries— series titlespersName— person names referenced in the textplaceName— place names referenced in the textfullText— full body text for keyword search
git clone https://github.com/srophe/translations-app.git
cd translations-app
python3 -m http.server 8000Open http://localhost:8000. Search works client-side using json/combined.json.
# Clone the data repo alongside the app
git clone https://github.com/srophe/translations-data.git
# Generate combined search JSON
python3 tei2json.py translations-data/tei/ --combined json/combined.json --prettyRequires Saxon-HE:
java -jar saxon.jar -s:translations-data/tei/10510.xml -xsl:siteGenerator/xsl/tei2html.xsl -o:data/10510.htmlOr batch convert all files:
for f in translations-data/tei/*.xml; do
id=$(basename "$f" .xml)
java -jar saxon.jar -s:"$f" -xsl:siteGenerator/xsl/tei2html.xsl -o:"data/${id}.html"
doneInfrastructure is defined in infrastructure/cloudformation.yml:
- S3 bucket with full public access block
- CloudFront distribution with HTTPS redirect
- Origin Access Control (OAC) for secure S3 access
- GitHub OIDC deploy role (no stored credentials)
Two workflows handle deployment:
codeToAWS.yml— syncs static assets (HTML, CSS, JS) on code changesdataToAWS.yml— processes TEI data and uploads HTML/JSON on data changes
For a free alternative:
- Go to repo Settings → Pages → Source → GitHub Actions
- Fix root-relative paths:
find . -name "*.html" -exec sed -i '' 's|href="/resources|href="./resources|g; s|src="/resources|src="./resources|g' {} + - Site will be at
https://srophe.github.io/translations-app/
- Bootstrap 3
- jQuery + jQuery UI
- Syriac virtual keyboard (phonetic, standard, Arabic, Greek, Russian)
Content licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).