AREMA Ontology

The AREMA Ontology is first and foremost the controlled vocabulary for the Atlas of Regenerative Materials (AREMA). This repository contains the ontology files, validation scripts, and tools required to maintain, check, and publish the ontology.

The ontology is designed to provide a shared conceptual framework and common vocabulary for describing regenerative materials, buildings, professionals, and related concepts within the AREMA platform. It enables both humans and machines to interpret and exchange information consistently.

📚 Purpose of the Ontology

An ontology defines terms, relationships, and structures to describe a particular domain — in this case, regenerative materials. For AREMA, this ontology serves multiple purposes:

Controlled Vocabulary for the Website

It powers the AREMA front-end by providing a standardized vocabulary for objects and properties that AREMA uses. This ensures that users interact with consistent terms throughout the platform, and allows us to link terms used on the front-end to definitions and relations in the ontology.

Interoperability and Data Quality

By formalizing the domain knowledge as RDF and SHACL, the ontology supports structured data exchange and integration with external systems, while enabling automated quality checks and re-using existing metadata standards.

🏗️ Architecture

The AREMA ontology system consists of three main components:

Ontology Manager API (FastAPI) - Converts Google Sheets to RDF and manages updates
Apache Fuseki - SPARQL endpoint and triplestore
SKOHub Vocabs - Static site generator for human-readable vocabulary browser

Workflow

Google Sheets → Ontology Manager → Fuseki (SPARQL) → GitHub → SKOHub → Static Site

The ontology manager fetches data from Google Sheets, converts it to SKOS/RDF format, uploads to Fuseki, and the changes are published via SKOHub to https://ontology.atlas-regenmat.ch/

📂 Repository Structure

arema-ontology/
├── docs/                        # SKOHub-generated static site
├── src/
│   ├── arema/                   # Core utility modules
│   │   └── sheets_utils.py      # Google Sheets API utilities
│   ├── server/                  # FastAPI service
│   │   └── main.py              # API endpoints and scheduler
│   ├── ontology/                # Generated ontology files (not in git, available as releases)
│   └── quality-checks/          # SHACL validation shapes
│       ├── shacl-shacl.ttl
│       └── skohub.shacl.ttl
├── tools/
│   ├── fuseki/                  # Fuseki configuration
│   │   ├── config.ttl           # Fuseki assembler config
│   │   ├── data/                # TDB2 database storage
│   │   └── docker-compose.yml   # Standalone Fuseki setup
│   ├── python/
│   │   ├── converter/           # Google Sheets → RDF converter
│   │   │   └── csv2ont.py
│   │   └── checks/              # Validation scripts
│   │       └── shacl.py
│   └── skohub-vocabs/           # Custom SKOHub configuration
├── .github/
│   └── workflows/
│       └── docs.yaml            # CI/CD for SKOHub builds
├── Dockerfile                   # Container definition
├── docker-compose.yml           # Multi-service orchestration
├── pyproject.toml               # Python dependencies
└── uv.lock                      # Locked dependencies

Key Components

src/server/main.py
FastAPI service that manages ontology conversions and updates. Provides REST API endpoints for triggering updates and checking status.

src/arema/
Core utility modules for git operations and file management.

tools/python/converter/csv2ont.py
Converts Google Sheets data to SKOS/RDF format with support for:

Multilingual labels (en/de/fr/it)
Hierarchical concept schemes
Automatic upload to Fuseki

docker-compose.yml
Orchestrates the ontology manager and Fuseki services with proper networking and volume management.

tools/fuseki/
Apache Fuseki SPARQL endpoint configuration with TDB2 storage and union default graph support.

src/quality-checks/
SHACL shapes for validating ontology consistency and SKOHub compatibility.

🔍 Quality Assurance

The ontology undergoes multiple validation checks:

SHACL Validation: Ensures structural consistency and conformance to SKOS patterns
SKOHub Compatibility: Validates compatibility with the static site generator
Automated Testing: test.sh script validates the full workflow

Run validation:

python tools/python/checks/shacl.py

🚀 Quick Start

Prerequisites

Docker and Docker Compose
Python 3.11+ (for local development)
uv package manager (recommended)
Google Service Account JSON key file (see below)

Google Sheets Authentication Setup

The service uses Google Drive API to efficiently check for Sheet modifications:

Create a Google Cloud Project and enable the Drive API
Create a Service Account and download the JSON key file
Place the JSON key as service_account.json in the repository root
Share your Google Sheet with the service account email (found in the JSON file under client_email)
Grant the service account "Viewer" permission

Running the Services

The repository includes a FastAPI service that automatically checks for Google Sheet updates every 5 minutes using the Drive API (no downloads unless the sheet was modified).

# 1. Copy and configure environment variables
cp .env.dist .env
# Edit .env and set FUSEKI_USERNAME and FUSEKI_PASSWORD

# 2. Place your service_account.json in the repository root

# 3. Start services (Fuseki + Ontology Manager)
docker compose up -d

# 4. Check service health
curl http://localhost:8000/
curl http://localhost:3030/$/ping

The ontology manager API runs on http://localhost:8000 and Fuseki on http://localhost:3030.

Automatic Updates: The service checks for Google Sheet modifications every 5 minutes using the Drive API. If changes are detected, it automatically converts the data and uploads to Fuseki triplestore.

API Endpoints

GET / - Service status, health check, and scheduler information (includes last check time and last update time)
PUT /update - Trigger immediate ontology update from Google Sheets
```
curl -X PUT http://localhost:8000/update
```

Local Development

# Install dependencies
uv sync

# Run conversion script directly
uv run tools/python/converter/csv2ont.py

# Run SHACL validation
python tools/python/checks/shacl.py

Testing

curl http://localhost:8000/
curl http://localhost:3030/$/ping

☸️ Kubernetes Deployment

Prerequisites

A running Kubernetes cluster with kubectl configured
The arema-ontology image available in your cluster's registry (update image: in k8s/arema-ontology-deployment.yaml if needed)
A service_account.json Google service account key file

1. Create the Secret

Do not use k8s/secrets.yaml directly with real values committed to git. Instead, create the secret imperatively:

kubectl create secret generic arema-secrets \
  --from-literal=fuseki-password=<PASSWORD> \
  --from-literal=github-token=<TOKEN> \
  --from-file=service-account-json=service_account.json

2. Apply All Resources

kubectl apply -f k8s/fuseki-configmap.yaml
kubectl apply -f k8s/fuseki-pvc.yaml
kubectl apply -f k8s/fuseki-service.yaml
kubectl apply -f k8s/fuseki-deployment.yaml
kubectl apply -f k8s/arema-ontology-deployment.yaml

3. Verify Everything is Running

kubectl get pods
kubectl get services
kubectl get pvc

4. Port Forwarding (local access)

kubectl port-forward service/fuseki 3030:3030

Fuseki is then accessible at http://localhost:3030.

Updating Configuration

If you change k8s/fuseki-configmap.yaml, apply and restart:

kubectl apply -f k8s/fuseki-configmap.yaml
kubectl rollout restart deployment/fuseki

📖 Documentation

API_README.md - Detailed API documentation and usage examples
AUTOMATION.md - Automation features (see feature branch)
Online Vocabulary: https://ontology.atlas-regenmat.ch/
SPARQL Endpoint: http://localhost:3030/arema/sparql (when running locally)

🐳 Docker Configuration

Environment Variables

Required in .env (copy from .env.dist):

Ports

8000 - Ontology Manager API
3030 - Apache Fuseki SPARQL endpoint

Volumes

./src - Ontology output files
./tools/fuseki/data - Fuseki database persistence
./tools/fuseki/config.ttl - Fuseki configuration

🔄 Development Workflow

Edit Google Sheet - Domain experts update the taxonomy
Trigger Update - API endpoint or wait for scheduled update
Conversion - CSV → SKOS/RDF format
Upload - Data uploaded to Fuseki triplestore
Validation - SHACL checks ensure quality
Publication - GitHub Actions builds SKOHub static site
Deployment - Published to https://ontology.atlas-regenmat.ch/

Contact

The Atlas of Regenerative Materials originates as a project from the Chair of Sustainable Construction at ETH Zurich, led by Professor Guillaume Habert.

The project was made possible thanks to the initial support of the Ricola Foundation, and the ETH Domain Open Research Data Program.

For comments, ideas or remarks, please contact shhuber@ethz.ch

Name		Name	Last commit message	Last commit date
Latest commit History 208 Commits
.github/workflows		.github/workflows
docs		docs
external		external
k8s		k8s
src		src
tools		tools
.dockerignore		.dockerignore
.env.dist		.env.dist
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
flake.lock		flake.lock
flake.nix		flake.nix
pyproject.toml		pyproject.toml
uv.lock		uv.lock
vendir.lock.yml		vendir.lock.yml

Folders and files

Latest commit

History

Repository files navigation

AREMA Ontology

📚 Purpose of the Ontology

Controlled Vocabulary for the Website

Interoperability and Data Quality

🏗️ Architecture

Workflow

📂 Repository Structure

Key Components

🔍 Quality Assurance

🚀 Quick Start

Prerequisites

Google Sheets Authentication Setup

Running the Services

API Endpoints

Local Development

Testing

☸️ Kubernetes Deployment

Prerequisites

1. Create the Secret

2. Apply All Resources

3. Verify Everything is Running

4. Port Forwarding (local access)

Updating Configuration

📖 Documentation

🐳 Docker Configuration

Environment Variables

Ports

Volumes

🔄 Development Workflow

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages