Skip to content

Commit d8dcd25

Browse files
authored
Merge pull request #4 from sdsc-ordes/refactor/ontology-manager-compose
refactor: ontology manager compose
2 parents f05bd74 + 330b9c2 commit d8dcd25

317 files changed

Lines changed: 1506 additions & 644 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.dockerignore

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Git
2+
.git
3+
.gitignore
4+
5+
# Python
6+
__pycache__
7+
*.pyc
8+
*.pyo
9+
*.pyd
10+
.Python
11+
*.so
12+
*.egg
13+
*.egg-info
14+
dist
15+
build
16+
.venv
17+
venv/
18+
19+
# IDE
20+
.vscode
21+
.idea
22+
*.swp
23+
*.swo
24+
*~
25+
26+
# Documentation
27+
docs/
28+
public/
29+
README.md
30+
31+
# External dependencies
32+
external/
33+
34+
# Environment
35+
.env
36+
.env.*
37+
!.env.dist
38+
39+
# Fuseki data
40+
tools/fuseki/data/
41+
42+
# GitHub
43+
.github/

.env.dist

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,13 @@ ONTOLOGY_SPARQL_ENDPOINT=
33
GRAPHDB_USERNAME=
44
GRAPHDB_PASSWORD=
55
GRAPH_URI=
6+
7+
# Fuseki Configuration (required for docker-compose)
8+
FUSEKI_URL=http://localhost:3030/arema/data
9+
FUSEKI_USERNAME=admin
10+
FUSEKI_PASSWORD=changeme
11+
12+
# Google Sheets Configuration
13+
GOOGLE_SHEET_ID=1RL6Y120_H9-yD8x52eZO44S2iLQpLoZHitcExHsPfPs
14+
GOOGLE_SHEET_OBJECTS_GID=1120751986
15+
GOOGLE_SHEET_PROPERTIES_GID=373147482

.github/workflows/docs.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ jobs:
4141
CONTAINER_ID=$(docker create \
4242
-v $(pwd)/src/ontology:/app/data \
4343
-v $(pwd)/.env:/app/.env \
44-
ghcr.io/sdsc-ordes/skohub-vocabs:v0.3.1 \
44+
ghcr.io/sdsc-ordes/skohub-vocabs:v0.3.2 \
4545
sh -c "npm run build"
4646
)
4747

.gitignore

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,21 @@
11
external/skohub-vocabs
22
.env
33
/tools/fuseki/data/
4+
tools/python/converter/__pycache__/csv2ont.cpython-311.pyc
5+
tools/python/converter/__pycache__/csv2ont.cpython-312.pyc
6+
service_account.json
7+
src/arema/__pycache__/file_utils.cpython-311.pyc
8+
src/arema/__pycache__/git_utils.cpython-311.pyc
9+
src/arema/__pycache__/scheduler.cpython-311.pyc
10+
src/arema/__pycache__/sheets_utils.cpython-311.pyc
11+
src/arema/__pycache__/update_service.cpython-311.pyc
12+
src/server/__pycache__/__init__.cpython-311.pyc
13+
src/server/__pycache__/main.cpython-311.pyc
14+
src/__pycache__/__init__.cpython-311.pyc
15+
tools/__pycache__/__init__.cpython-311.pyc
16+
tools/python/__pycache__/__init__.cpython-311.pyc
17+
src/arema/__pycache__/__init__.cpython-311.pyc
18+
tools/python/converter/__pycache__/__init__.cpython-311.pyc
419

520
# Python
621
__pycache__/
@@ -12,4 +27,5 @@ __pycache__/
1227
dist/
1328
build/
1429
.venv/
15-
venv/
30+
venv/
31+

Dockerfile

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
FROM python:3.11-slim
2+
3+
# No system dependencies needed anymore
4+
5+
# Install uv for fast dependency management
6+
RUN pip install --no-cache-dir uv
7+
# COPY --from uv:latest /bin/uv /bin/uv
8+
WORKDIR /app
9+
10+
# Copy dependency files
11+
COPY pyproject.toml uv.lock ./
12+
13+
# Export and install dependencies
14+
RUN uv sync
15+
16+
# Copy application code
17+
COPY . .
18+
19+
# Set Python path so imports work correctly
20+
ENV PYTHONPATH="/app/src:/app/tools/python/converter:${PYTHONPATH}"
21+
22+
# Expose port
23+
EXPOSE 8000
24+
25+
# Health check
26+
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
27+
CMD python -c "import requests; requests.get('http://localhost:8000/')" || exit 1
28+
29+
# Start Service
30+
CMD ["uv", "run", "uvicorn", "src.server.main:app", "--host", "0.0.0.0", "--port", "8000"]

README.md

Lines changed: 171 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -12,66 +12,202 @@ It powers the AREMA front-end by providing a standardized vocabulary for objects
1212
### Interoperability and Data Quality
1313
By formalizing the domain knowledge as RDF and SHACL, the ontology supports structured data exchange and integration with external systems, while enabling automated quality checks and re-using existing metadata standards.
1414

15+
## 🏗️ Architecture
16+
17+
The AREMA ontology system consists of three main components:
18+
19+
1. **Ontology Manager API** (FastAPI) - Converts Google Sheets to RDF and manages updates
20+
2. **Apache Fuseki** - SPARQL endpoint and triplestore
21+
3. **SKOHub Vocabs** - Static site generator for human-readable vocabulary browser
22+
23+
### Workflow
24+
```
25+
Google Sheets → Ontology Manager → Fuseki (SPARQL) → GitHub → SKOHub → Static Site
26+
```
27+
28+
The ontology manager fetches data from Google Sheets, converts it to SKOS/RDF format, uploads to Fuseki, and the changes are published via SKOHub to https://ontology.atlas-regenmat.ch/
29+
1530
## 📂 Repository Structure
1631
```plaintext
17-
AREMA Ontology Repository
18-
├── docs/ # Documentation outputs
19-
├── external/ # External resources (placeholder)
32+
arema-ontology/
33+
├── docs/ # SKOHub-generated static site
2034
├── src/
21-
│ ├── ontology/ # The ontology files
22-
│ │ ├── AREMA-ontology.ttl
23-
│ │ └── README.md
24-
│ └── quality-checks/ # SHACL shapes for checking ontology validity and skohub compatibility
35+
│ ├── arema/ # Core utility modules
36+
│ │ └── sheets_utils.py # Google Sheets API utilities
37+
│ ├── server/ # FastAPI service
38+
│ │ └── main.py # API endpoints and scheduler
39+
│ ├── ontology/ # Generated ontology files
40+
│ │ └── AREMA-ontology.ttl # Main ontology (auto-generated)
41+
│ └── quality-checks/ # SHACL validation shapes
2542
│ ├── shacl-shacl.ttl
2643
│ └── skohub.shacl.ttl
2744
├── tools/
28-
│ ├── python/ # Python tooling for validation and documentation
29-
│ │ ├── checks/
30-
│ │ │ └── shacl.py
31-
│ │ ├── docs/
32-
│ │ │ └── sparql.py
33-
│ │ └── requirements.txt
34-
│ └── skohub-vocabs/ # Custom skohub-vocabs files required to match AREMA style
35-
├── LICENSE
36-
├── README.md
37-
├── .env
38-
├── .gitignore
39-
├── pyproject.toml
40-
├── uv.lock
41-
45+
│ ├── fuseki/ # Fuseki configuration
46+
│ │ ├── config.ttl # Fuseki assembler config
47+
│ │ ├── data/ # TDB2 database storage
48+
│ │ └── docker-compose.yml # Standalone Fuseki setup
49+
│ ├── python/
50+
│ │ ├── converter/ # Google Sheets → RDF converter
51+
│ │ │ └── csv2ont.py
52+
│ │ └── checks/ # Validation scripts
53+
│ │ └── shacl.py
54+
│ └── skohub-vocabs/ # Custom SKOHub configuration
55+
├── .github/
56+
│ └── workflows/
57+
│ └── docs.yaml # CI/CD for SKOHub builds
58+
├── Dockerfile # Container definition
59+
├── docker-compose.yml # Multi-service orchestration
60+
├── pyproject.toml # Python dependencies
61+
└── uv.lock # Locked dependencies
4262
```
4363

4464
### Key Components
45-
**src/ontology/**:
46-
The core ontology in Turtle (.ttl) format.
4765

48-
**src/quality-checks/**:
49-
SHACL shapes for validating ontology consistency and compatibility with tools like SKOHUB.
66+
**src/server/main.py**
67+
FastAPI service that manages ontology conversions and updates. Provides REST API endpoints for triggering updates and checking status.
68+
69+
**src/arema/**
70+
Core utility modules for git operations and file management.
71+
72+
**tools/python/converter/csv2ont.py**
73+
Converts Google Sheets data to SKOS/RDF format with support for:
74+
- Multilingual labels (en/de/fr/it)
75+
- QUDT units and symbols
76+
- Hierarchical concept schemes
77+
- Automatic upload to Fuseki
5078

51-
**tools/python/**:
52-
Python scripts for running SHACL checks.
79+
**docker-compose.yml**
80+
Orchestrates the ontology manager and Fuseki services with proper networking and volume management.
81+
82+
**tools/fuseki/**
83+
Apache Fuseki SPARQL endpoint configuration with TDB2 storage and union default graph support.
84+
85+
**src/quality-checks/**
86+
SHACL shapes for validating ontology consistency and SKOHub compatibility.
5387

5488
## 🔍 Quality Assurance
55-
We employ SHACL shapes to validate the ontology structure and ensure ongoing data integrity. Scripts in `tools/python/` assist with:
5689

57-
- Running SHACL validation against the ontology.
58-
- Generating documentation from SPARQL queries.
90+
The ontology undergoes multiple validation checks:
5991

60-
## 🚀 Usage (For Developers / Maintainers)
92+
- **SHACL Validation**: Ensures structural consistency and conformance to SKOS/QUDT patterns
93+
- **SKOHub Compatibility**: Validates compatibility with the static site generator
94+
- **Automated Testing**: `test.sh` script validates the full workflow
6195

62-
### Requirements
96+
Run validation:
6397
```bash
64-
pip install -r tools/python/requirements.txt
98+
python tools/python/checks/shacl.py
6599
```
66100

67-
### Run SHACL Checks
101+
## 🚀 Quick Start
102+
103+
### Prerequisites
104+
- Docker and Docker Compose
105+
- Python 3.11+ (for local development)
106+
- uv package manager (recommended)
107+
- Google Service Account JSON key file (see below)
108+
109+
### Google Sheets Authentication Setup
110+
111+
The service uses Google Drive API to efficiently check for Sheet modifications:
112+
113+
1. Create a Google Cloud Project and enable the Drive API
114+
2. Create a Service Account and download the JSON key file
115+
3. Place the JSON key as `service_account.json` in the repository root
116+
4. Share your Google Sheet with the service account email (found in the JSON file under `client_email`)
117+
5. Grant the service account "Viewer" permission
118+
119+
### Running the Services
120+
121+
The repository includes a FastAPI service that automatically checks for Google Sheet updates every 5 minutes using the Drive API (no downloads unless the sheet was modified).
122+
68123
```bash
124+
# 1. Copy and configure environment variables
125+
cp .env.dist .env
126+
# Edit .env and set FUSEKI_USERNAME and FUSEKI_PASSWORD
127+
128+
# 2. Place your service_account.json in the repository root
129+
130+
# 3. Start services (Fuseki + Ontology Manager)
131+
docker compose up -d
132+
133+
# 4. Check service health
134+
curl http://localhost:8000/
135+
curl http://localhost:3030/$/ping
136+
```
137+
138+
The ontology manager API runs on `http://localhost:8000` and Fuseki on `http://localhost:3030`.
139+
140+
**Automatic Updates:** The service checks for Google Sheet modifications every 5 minutes using the Drive API. If changes are detected, it automatically converts the data and uploads to Fuseki triplestore.
141+
142+
### API Endpoints
143+
144+
- `GET /` - Service status, health check, and scheduler information (includes last check time and last update time)
145+
- `PUT /update` - Trigger immediate ontology update from Google Sheets
146+
```bash
147+
curl -X PUT http://localhost:8000/update
148+
```
149+
150+
### Local Development
151+
152+
```bash
153+
# Install dependencies
154+
uv sync
155+
156+
# Run conversion script directly
157+
uv run tools/python/converter/csv2ont.py
158+
159+
# Run SHACL validation
69160
python tools/python/checks/shacl.py
70161
```
71162

163+
### Testing
164+
165+
```bash
166+
curl http://localhost:8000/
167+
curl http://localhost:3030/$/ping
168+
```
169+
170+
## 📖 Documentation
171+
172+
- **[API_README.md](API_README.md)** - Detailed API documentation and usage examples
173+
- **[AUTOMATION.md](AUTOMATION.md)** - Automation features (see feature branch)
174+
- **Online Vocabulary**: https://ontology.atlas-regenmat.ch/
175+
- **SPARQL Endpoint**: http://localhost:3030/arema/sparql (when running locally)
176+
177+
## 🐳 Docker Configuration
178+
179+
### Environment Variables
180+
181+
Required in `.env`:
182+
```bash
183+
FUSEKI_URL=http://localhost:3030/arema/data
184+
FUSEKI_USERNAME=admin
185+
FUSEKI_PASSWORD=your_secure_password
186+
```
187+
188+
### Ports
189+
- `8000` - Ontology Manager API
190+
- `3030` - Apache Fuseki SPARQL endpoint
191+
192+
### Volumes
193+
- `./src` - Ontology output files
194+
- `./tools/fuseki/data` - Fuseki database persistence
195+
- `./tools/fuseki/config.ttl` - Fuseki configuration
196+
197+
## 🔄 Development Workflow
198+
199+
1. **Edit Google Sheet** - Domain experts update the taxonomy
200+
2. **Trigger Update** - API endpoint or wait for scheduled update
201+
3. **Conversion** - CSV → SKOS/RDF with QUDT units
202+
4. **Upload** - Data uploaded to Fuseki triplestore
203+
5. **Validation** - SHACL checks ensure quality
204+
6. **Publication** - GitHub Actions builds SKOHub static site
205+
7. **Deployment** - Published to https://ontology.atlas-regenmat.ch/
206+
72207
## Contact
208+
73209
The Atlas of Regenerative Materials originates as a project from the Chair of Sustainable Construction at ETH Zurich, led by Professor Guillaume Habert.
74210

75-
The project was made possible thanks to the initial support of the Ricola Foundation, and the ETH Domain Open Research Data Program .
211+
The project was made possible thanks to the initial support of the Ricola Foundation, and the ETH Domain Open Research Data Program.
76212

77213
For comments, ideas or remarks, please contact shhuber@ethz.ch

0 commit comments

Comments
 (0)