The AREMA Ontology is first and foremost the controlled vocabulary for the Atlas of Regenerative Materials (AREMA). This repository contains the ontology files, validation scripts, and tools required to maintain, check, and publish the ontology.
The ontology is designed to provide a shared conceptual framework and common vocabulary for describing regenerative materials, buildings, professionals, and related concepts within the AREMA platform. It enables both humans and machines to interpret and exchange information consistently.
An ontology defines terms, relationships, and structures to describe a particular domain — in this case, regenerative materials. For AREMA, this ontology serves multiple purposes:
It powers the AREMA front-end by providing a standardized vocabulary for objects and properties that AREMA uses. This ensures that users interact with consistent terms throughout the platform, and allows us to link terms used on the front-end to definitions and relations in the ontology.
By formalizing the domain knowledge as RDF and SHACL, the ontology supports structured data exchange and integration with external systems, while enabling automated quality checks and re-using existing metadata standards.
The AREMA ontology system consists of three main components:
- Ontology Manager API (FastAPI) - Converts Google Sheets to RDF and manages updates
- Apache Fuseki - SPARQL endpoint and triplestore
- SKOHub Vocabs - Static site generator for human-readable vocabulary browser
Google Sheets → Ontology Manager → Fuseki (SPARQL) → GitHub → SKOHub → Static Site
The ontology manager fetches data from Google Sheets, converts it to SKOS/RDF format, uploads to Fuseki, and the changes are published via SKOHub to https://ontology.atlas-regenmat.ch/
arema-ontology/
├── docs/ # SKOHub-generated static site
├── src/
│ ├── arema/ # Core utility modules
│ │ └── sheets_utils.py # Google Sheets API utilities
│ ├── server/ # FastAPI service
│ │ └── main.py # API endpoints and scheduler
│ ├── ontology/ # Generated ontology files (not in git, available as releases)
│ └── quality-checks/ # SHACL validation shapes
│ ├── shacl-shacl.ttl
│ └── skohub.shacl.ttl
├── tools/
│ ├── fuseki/ # Fuseki configuration
│ │ ├── config.ttl # Fuseki assembler config
│ │ ├── data/ # TDB2 database storage
│ │ └── docker-compose.yml # Standalone Fuseki setup
│ ├── python/
│ │ ├── converter/ # Google Sheets → RDF converter
│ │ │ └── csv2ont.py
│ │ └── checks/ # Validation scripts
│ │ └── shacl.py
│ └── skohub-vocabs/ # Custom SKOHub configuration
├── .github/
│ └── workflows/
│ └── docs.yaml # CI/CD for SKOHub builds
├── Dockerfile # Container definition
├── docker-compose.yml # Multi-service orchestration
├── pyproject.toml # Python dependencies
└── uv.lock # Locked dependencies
src/server/main.py
FastAPI service that manages ontology conversions and updates. Provides REST API endpoints for triggering updates and checking status.
src/arema/
Core utility modules for git operations and file management.
tools/python/converter/csv2ont.py
Converts Google Sheets data to SKOS/RDF format with support for:
- Multilingual labels (en/de/fr/it)
- Hierarchical concept schemes
- Automatic upload to Fuseki
docker-compose.yml
Orchestrates the ontology manager and Fuseki services with proper networking and volume management.
tools/fuseki/
Apache Fuseki SPARQL endpoint configuration with TDB2 storage and union default graph support.
src/quality-checks/
SHACL shapes for validating ontology consistency and SKOHub compatibility.
The ontology undergoes multiple validation checks:
- SHACL Validation: Ensures structural consistency and conformance to SKOS patterns
- SKOHub Compatibility: Validates compatibility with the static site generator
- Automated Testing:
test.shscript validates the full workflow
Run validation:
python tools/python/checks/shacl.py- Docker and Docker Compose
- Python 3.11+ (for local development)
- uv package manager (recommended)
- Google Service Account JSON key file (see below)
The service uses Google Drive API to efficiently check for Sheet modifications:
- Create a Google Cloud Project and enable the Drive API
- Create a Service Account and download the JSON key file
- Place the JSON key as
service_account.jsonin the repository root - Share your Google Sheet with the service account email (found in the JSON file under
client_email) - Grant the service account "Viewer" permission
The repository includes a FastAPI service that automatically checks for Google Sheet updates every 5 minutes using the Drive API (no downloads unless the sheet was modified).
# 1. Copy and configure environment variables
cp .env.dist .env
# Edit .env and set FUSEKI_USERNAME and FUSEKI_PASSWORD
# 2. Place your service_account.json in the repository root
# 3. Start services (Fuseki + Ontology Manager)
docker compose up -d
# 4. Check service health
curl http://localhost:8000/
curl http://localhost:3030/$/pingThe ontology manager API runs on http://localhost:8000 and Fuseki on http://localhost:3030.
Automatic Updates: The service checks for Google Sheet modifications every 5 minutes using the Drive API. If changes are detected, it automatically converts the data and uploads to Fuseki triplestore.
GET /- Service status, health check, and scheduler information (includes last check time and last update time)PUT /update- Trigger immediate ontology update from Google Sheetscurl -X PUT http://localhost:8000/update
# Install dependencies
uv sync
# Run conversion script directly
uv run tools/python/converter/csv2ont.py
# Run SHACL validation
python tools/python/checks/shacl.pycurl http://localhost:8000/
curl http://localhost:3030/$/ping- A running Kubernetes cluster with
kubectlconfigured - The
arema-ontologyimage available in your cluster's registry (updateimage:ink8s/arema-ontology-deployment.yamlif needed) - A
service_account.jsonGoogle service account key file
Do not use k8s/secrets.yaml directly with real values committed to git. Instead, create the secret imperatively:
kubectl create secret generic arema-secrets \
--from-literal=fuseki-password=<PASSWORD> \
--from-literal=github-token=<TOKEN> \
--from-file=service-account-json=service_account.jsonkubectl apply -f k8s/fuseki-configmap.yaml
kubectl apply -f k8s/fuseki-pvc.yaml
kubectl apply -f k8s/fuseki-service.yaml
kubectl apply -f k8s/fuseki-deployment.yaml
kubectl apply -f k8s/arema-ontology-deployment.yamlkubectl get pods
kubectl get services
kubectl get pvckubectl port-forward service/fuseki 3030:3030Fuseki is then accessible at http://localhost:3030.
If you change k8s/fuseki-configmap.yaml, apply and restart:
kubectl apply -f k8s/fuseki-configmap.yaml
kubectl rollout restart deployment/fuseki- API_README.md - Detailed API documentation and usage examples
- AUTOMATION.md - Automation features (see feature branch)
- Online Vocabulary: https://ontology.atlas-regenmat.ch/
- SPARQL Endpoint: http://localhost:3030/arema/sparql (when running locally)
Required in .env (copy from .env.dist):
8000- Ontology Manager API3030- Apache Fuseki SPARQL endpoint
./src- Ontology output files./tools/fuseki/data- Fuseki database persistence./tools/fuseki/config.ttl- Fuseki configuration
- Edit Google Sheet - Domain experts update the taxonomy
- Trigger Update - API endpoint or wait for scheduled update
- Conversion - CSV → SKOS/RDF format
- Upload - Data uploaded to Fuseki triplestore
- Validation - SHACL checks ensure quality
- Publication - GitHub Actions builds SKOHub static site
- Deployment - Published to https://ontology.atlas-regenmat.ch/
The Atlas of Regenerative Materials originates as a project from the Chair of Sustainable Construction at ETH Zurich, led by Professor Guillaume Habert.
The project was made possible thanks to the initial support of the Ricola Foundation, and the ETH Domain Open Research Data Program.
For comments, ideas or remarks, please contact shhuber@ethz.ch