Skip to content

Commit c056461

Browse files
authored
Merge pull request #4 from sdsc-ordes/dev-container
feat: basic devcontainer and copilot instructions
2 parents e8e0adf + 59509b7 commit c056461

4 files changed

Lines changed: 134 additions & 0 deletions

File tree

.devcontainer/Dockerfile

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
FROM ghcr.io/astral-sh/uv:python3.12-bookworm
2+
3+
# Create non-root user with UID/GID commonly used by VS Code (1000:1000)
4+
RUN useradd -ms /bin/bash -u 1000 vscode \
5+
&& apt-get update && apt-get install -y sudo \
6+
&& echo "vscode ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers
7+
8+
USER vscode
9+
WORKDIR /workspaces

.devcontainer/devcontainer.json

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
{
2+
"name": "string-to-things-dev",
3+
"build": {
4+
"dockerfile": "Dockerfile"
5+
},
6+
7+
// This is where your repo will be mounted inside the container
8+
"remoteUser": "vscode",
9+
"workspaceFolder": "/workspaces/${localWorkspaceFolderBasename}",
10+
11+
"customizations": {
12+
"vscode": {
13+
"settings": {
14+
"python.defaultInterpreterPath": "${workspaceFolder}/.venv/bin/python"
15+
},
16+
"extensions": [
17+
"ms-python.python",
18+
"ms-python.vscode-pylance",
19+
"tamasfe.even-better-toml"
20+
]
21+
}
22+
},
23+
24+
// FastAPI default port
25+
"forwardPorts": [7514],
26+
27+
// Install project in editable mode after the container is built
28+
"postCreateCommand": "rm -rf .venv && uv venv && uv pip install -e . && echo '. $PWD/.venv/bin/activate' >> /home/vscode/.bashrc"
29+
}

.env.dist

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
ONTOLOGIES_URL=
2+
KNOWLEDGE_GRAPH_URI=
3+
ONTOLOGY_URI=
4+
GRAPHDB_URL=
5+
GRAPHDB_USER=
6+
GRAPHDB_PASSWORD=

.github/copilot-instructions.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# AI Coding Agent Instructions - strings-to-things
2+
3+
## Project Overview
4+
This is an RDF transformation microservice that converts string literals in knowledge graphs to structured IRIs using ontology mappings. The service loads ontologies from SPARQL endpoints and performs exact/fuzzy matching to replace text values with semantic identifiers.
5+
6+
## Key Architecture Components
7+
8+
### Core RDF Transformation Flow
9+
- `OntologyManager` loads ontologies from GraphDB via SPARQL CONSTRUCT queries
10+
- Builds label maps: `{lowercase_label -> IRI}` from `rdfs:label` and `skos:prefLabel`
11+
- `RDFTransformer` processes input RDF graphs, replacing matching literals with IRIs
12+
- FastAPI endpoints accept RDF uploads and return transformed graphs
13+
14+
### Critical Patterns
15+
16+
**Ontology Loading**: Always load from named graphs using SPARQL CONSTRUCT:
17+
```python
18+
CONSTRUCT { ?s ?p ?o }
19+
WHERE { GRAPH <{graph_iri}> { ?s ?p ?o } }
20+
```
21+
22+
**Label Normalization**: All labels are `.strip().lower()` for consistent matching
23+
**Ambiguity Handling**: Duplicate labels across IRIs are detected and excluded from label maps
24+
**Fuzzy Matching**: Uses RapidFuzz with configurable thresholds (default: 90)
25+
26+
## Development Workflows
27+
28+
### Build & Run Commands
29+
```bash
30+
just setup # Initial project setup
31+
just develop # Enter Nix dev shell
32+
just run [args] # Execute CLI with uv
33+
just format # Format code with treefmt
34+
just notebook # Start Jupyter notebook
35+
uv build --out-dir .output/build # Build package
36+
```
37+
38+
### Testing Patterns
39+
- Test with `pytest` using RDFLib Graph fixtures
40+
- Mock SPARQL endpoints for ontology loading tests
41+
- Test both exact and fuzzy matching scenarios in `test_rdf_transformer.py`
42+
43+
### Configuration Management
44+
Settings via Pydantic with `.env` file:
45+
- `ONTOLOGY_SPARQL_ENDPOINT`: GraphDB endpoint URL
46+
- `ONTOLOGY_GRAPH_IRIS`: Comma-separated named graph URIs
47+
- `GRAPHDB_USERNAME/PASSWORD`: Authentication
48+
- `FAIL_ON_AMBIGUOUS_LABELS`: Boolean for strict validation
49+
50+
## Project-Specific Conventions
51+
52+
### RDF Handling
53+
- Use RDFLib for all RDF operations
54+
- Preserve original triples when adding transformed versions
55+
- Add provenance with `thingOf` predicate linking IRI to original literal
56+
- Support multiple serialization formats via FastAPI form parameters
57+
58+
### Error Handling
59+
- Validate ontology loading with comprehensive logging
60+
- Handle SPARQL connection failures gracefully
61+
- Log all transformation decisions with `TransformationLog`
62+
63+
### File Structure
64+
```
65+
src/strings2things/app/
66+
├── main.py # FastAPI app entry point
67+
├── config.py # Pydantic settings
68+
├── api/endpoints.py # REST API routes
69+
├── core/
70+
│ ├── ontology_manager.py # SPARQL loading & label mapping
71+
│ ├── rdf_transformer.py # Core transformation logic
72+
│ └── transformation_log.py # Audit trail
73+
└── utils/rdf_utils.py # RDF parsing/serialization
74+
```
75+
76+
### Dependencies
77+
- Uses `uv` for Python package management
78+
- RDF: `rdflib`, `SPARQLWrapper`
79+
- Web: `fastapi`, `python-multipart`
80+
- Fuzzy matching: `rapidfuzz`
81+
- Validation: `pydantic`, `pydantic-settings`
82+
83+
### Nix Integration
84+
Project uses Nix flakes for reproducible environments. All commands should run through `just` which wraps Nix development shells. The `justfile` contains the canonical build/test/format commands.
85+
86+
## Key Implementation Notes
87+
- Ontologies define enumeration classes with `rdfs:label` values for matching
88+
- The imaging ontology (`examples/ontologies/ontology.ttl`) shows the expected structure
89+
- Fuzzy matching is optional and configurable per-request
90+
- All transformations preserve backward compatibility by keeping original triples

0 commit comments

Comments
 (0)