You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -12,66 +12,202 @@ It powers the AREMA front-end by providing a standardized vocabulary for objects
12
12
### Interoperability and Data Quality
13
13
By formalizing the domain knowledge as RDF and SHACL, the ontology supports structured data exchange and integration with external systems, while enabling automated quality checks and re-using existing metadata standards.
14
14
15
+
## 🏗️ Architecture
16
+
17
+
The AREMA ontology system consists of three main components:
18
+
19
+
1.**Ontology Manager API** (FastAPI) - Converts Google Sheets to RDF and manages updates
20
+
2.**Apache Fuseki** - SPARQL endpoint and triplestore
21
+
3.**SKOHub Vocabs** - Static site generator for human-readable vocabulary browser
22
+
23
+
### Workflow
24
+
```
25
+
Google Sheets → Ontology Manager → Fuseki (SPARQL) → GitHub → SKOHub → Static Site
26
+
```
27
+
28
+
The ontology manager fetches data from Google Sheets, converts it to SKOS/RDF format, uploads to Fuseki, and the changes are published via SKOHub to https://ontology.atlas-regenmat.ch/
29
+
15
30
## 📂 Repository Structure
16
31
```plaintext
17
-
AREMA Ontology Repository
18
-
├── docs/ # Documentation outputs
19
-
├── external/ # External resources (placeholder)
32
+
arema-ontology/
33
+
├── docs/ # SKOHub-generated static site
20
34
├── src/
21
-
│ ├── ontology/ # The ontology files
22
-
│ │ ├── AREMA-ontology.ttl
23
-
│ │ └── README.md
24
-
│ └── quality-checks/ # SHACL shapes for checking ontology validity and skohub compatibility
35
+
│ ├── arema/ # Core utility modules
36
+
│ │ └── sheets_utils.py # Google Sheets API utilities
37
+
│ ├── server/ # FastAPI service
38
+
│ │ └── main.py # API endpoints and scheduler
39
+
│ ├── ontology/ # Generated ontology files
40
+
│ │ └── AREMA-ontology.ttl # Main ontology (auto-generated)
41
+
│ └── quality-checks/ # SHACL validation shapes
25
42
│ ├── shacl-shacl.ttl
26
43
│ └── skohub.shacl.ttl
27
44
├── tools/
28
-
│ ├── python/ # Python tooling for validation and documentation
29
-
│ │ ├── checks/
30
-
│ │ │ └── shacl.py
31
-
│ │ ├── docs/
32
-
│ │ │ └── sparql.py
33
-
│ │ └── requirements.txt
34
-
│ └── skohub-vocabs/ # Custom skohub-vocabs files required to match AREMA style
SHACL shapes for validating ontology consistency and compatibility with tools like SKOHUB.
66
+
**src/server/main.py**
67
+
FastAPI service that manages ontology conversions and updates. Provides REST API endpoints for triggering updates and checking status.
68
+
69
+
**src/arema/**
70
+
Core utility modules for git operations and file management.
71
+
72
+
**tools/python/converter/csv2ont.py**
73
+
Converts Google Sheets data to SKOS/RDF format with support for:
74
+
- Multilingual labels (en/de/fr/it)
75
+
- QUDT units and symbols
76
+
- Hierarchical concept schemes
77
+
- Automatic upload to Fuseki
50
78
51
-
**tools/python/**:
52
-
Python scripts for running SHACL checks.
79
+
**docker-compose.yml**
80
+
Orchestrates the ontology manager and Fuseki services with proper networking and volume management.
81
+
82
+
**tools/fuseki/**
83
+
Apache Fuseki SPARQL endpoint configuration with TDB2 storage and union default graph support.
84
+
85
+
**src/quality-checks/**
86
+
SHACL shapes for validating ontology consistency and SKOHub compatibility.
53
87
54
88
## 🔍 Quality Assurance
55
-
We employ SHACL shapes to validate the ontology structure and ensure ongoing data integrity. Scripts in `tools/python/` assist with:
56
89
57
-
- Running SHACL validation against the ontology.
58
-
- Generating documentation from SPARQL queries.
90
+
The ontology undergoes multiple validation checks:
59
91
60
-
## 🚀 Usage (For Developers / Maintainers)
92
+
-**SHACL Validation**: Ensures structural consistency and conformance to SKOS/QUDT patterns
93
+
-**SKOHub Compatibility**: Validates compatibility with the static site generator
94
+
-**Automated Testing**: `test.sh` script validates the full workflow
61
95
62
-
### Requirements
96
+
Run validation:
63
97
```bash
64
-
pip install -r tools/python/requirements.txt
98
+
python tools/python/checks/shacl.py
65
99
```
66
100
67
-
### Run SHACL Checks
101
+
## 🚀 Quick Start
102
+
103
+
### Prerequisites
104
+
- Docker and Docker Compose
105
+
- Python 3.11+ (for local development)
106
+
- uv package manager (recommended)
107
+
- Google Service Account JSON key file (see below)
108
+
109
+
### Google Sheets Authentication Setup
110
+
111
+
The service uses Google Drive API to efficiently check for Sheet modifications:
112
+
113
+
1. Create a Google Cloud Project and enable the Drive API
114
+
2. Create a Service Account and download the JSON key file
115
+
3. Place the JSON key as `service_account.json` in the repository root
116
+
4. Share your Google Sheet with the service account email (found in the JSON file under `client_email`)
117
+
5. Grant the service account "Viewer" permission
118
+
119
+
### Running the Services
120
+
121
+
The repository includes a FastAPI service that automatically checks for Google Sheet updates every 5 minutes using the Drive API (no downloads unless the sheet was modified).
122
+
68
123
```bash
124
+
# 1. Copy and configure environment variables
125
+
cp .env.dist .env
126
+
# Edit .env and set FUSEKI_USERNAME and FUSEKI_PASSWORD
127
+
128
+
# 2. Place your service_account.json in the repository root
129
+
130
+
# 3. Start services (Fuseki + Ontology Manager)
131
+
docker compose up -d
132
+
133
+
# 4. Check service health
134
+
curl http://localhost:8000/
135
+
curl http://localhost:3030/$/ping
136
+
```
137
+
138
+
The ontology manager API runs on `http://localhost:8000` and Fuseki on `http://localhost:3030`.
139
+
140
+
**Automatic Updates:** The service checks for Google Sheet modifications every 5 minutes using the Drive API. If changes are detected, it automatically converts the data and uploads to Fuseki triplestore.
141
+
142
+
### API Endpoints
143
+
144
+
-`GET /` - Service status, health check, and scheduler information (includes last check time and last update time)
145
+
-`PUT /update` - Trigger immediate ontology update from Google Sheets
146
+
```bash
147
+
curl -X PUT http://localhost:8000/update
148
+
```
149
+
150
+
### Local Development
151
+
152
+
```bash
153
+
# Install dependencies
154
+
uv sync
155
+
156
+
# Run conversion script directly
157
+
uv run tools/python/converter/csv2ont.py
158
+
159
+
# Run SHACL validation
69
160
python tools/python/checks/shacl.py
70
161
```
71
162
163
+
### Testing
164
+
165
+
```bash
166
+
curl http://localhost:8000/
167
+
curl http://localhost:3030/$/ping
168
+
```
169
+
170
+
## 📖 Documentation
171
+
172
+
-**[API_README.md](API_README.md)** - Detailed API documentation and usage examples
173
+
-**[AUTOMATION.md](AUTOMATION.md)** - Automation features (see feature branch)
1.**Edit Google Sheet** - Domain experts update the taxonomy
200
+
2.**Trigger Update** - API endpoint or wait for scheduled update
201
+
3.**Conversion** - CSV → SKOS/RDF with QUDT units
202
+
4.**Upload** - Data uploaded to Fuseki triplestore
203
+
5.**Validation** - SHACL checks ensure quality
204
+
6.**Publication** - GitHub Actions builds SKOHub static site
205
+
7.**Deployment** - Published to https://ontology.atlas-regenmat.ch/
206
+
72
207
## Contact
208
+
73
209
The Atlas of Regenerative Materials originates as a project from the Chair of Sustainable Construction at ETH Zurich, led by Professor Guillaume Habert.
74
210
75
-
The project was made possible thanks to the initial support of the Ricola Foundation, and the ETH Domain Open Research Data Program.
211
+
The project was made possible thanks to the initial support of the Ricola Foundation, and the ETH Domain Open Research Data Program.
76
212
77
213
For comments, ideas or remarks, please contact shhuber@ethz.ch
0 commit comments