|
| 1 | +# AI Coding Agent Instructions - strings-to-things |
| 2 | + |
| 3 | +## Project Overview |
| 4 | +This is an RDF transformation microservice that converts string literals in knowledge graphs to structured IRIs using ontology mappings. The service loads ontologies from SPARQL endpoints and performs exact/fuzzy matching to replace text values with semantic identifiers. |
| 5 | + |
| 6 | +## Key Architecture Components |
| 7 | + |
| 8 | +### Core RDF Transformation Flow |
| 9 | +- `OntologyManager` loads ontologies from GraphDB via SPARQL CONSTRUCT queries |
| 10 | +- Builds label maps: `{lowercase_label -> IRI}` from `rdfs:label` and `skos:prefLabel` |
| 11 | +- `RDFTransformer` processes input RDF graphs, replacing matching literals with IRIs |
| 12 | +- FastAPI endpoints accept RDF uploads and return transformed graphs |
| 13 | + |
| 14 | +### Critical Patterns |
| 15 | + |
| 16 | +**Ontology Loading**: Always load from named graphs using SPARQL CONSTRUCT: |
| 17 | +```python |
| 18 | +CONSTRUCT { ?s ?p ?o } |
| 19 | +WHERE { GRAPH <{graph_iri}> { ?s ?p ?o } } |
| 20 | +``` |
| 21 | + |
| 22 | +**Label Normalization**: All labels are `.strip().lower()` for consistent matching |
| 23 | +**Ambiguity Handling**: Duplicate labels across IRIs are detected and excluded from label maps |
| 24 | +**Fuzzy Matching**: Uses RapidFuzz with configurable thresholds (default: 90) |
| 25 | + |
| 26 | +## Development Workflows |
| 27 | + |
| 28 | +### Build & Run Commands |
| 29 | +```bash |
| 30 | +just setup # Initial project setup |
| 31 | +just develop # Enter Nix dev shell |
| 32 | +just run [args] # Execute CLI with uv |
| 33 | +just format # Format code with treefmt |
| 34 | +just notebook # Start Jupyter notebook |
| 35 | +uv build --out-dir .output/build # Build package |
| 36 | +``` |
| 37 | + |
| 38 | +### Testing Patterns |
| 39 | +- Test with `pytest` using RDFLib Graph fixtures |
| 40 | +- Mock SPARQL endpoints for ontology loading tests |
| 41 | +- Test both exact and fuzzy matching scenarios in `test_rdf_transformer.py` |
| 42 | + |
| 43 | +### Configuration Management |
| 44 | +Settings via Pydantic with `.env` file: |
| 45 | +- `ONTOLOGY_SPARQL_ENDPOINT`: GraphDB endpoint URL |
| 46 | +- `ONTOLOGY_GRAPH_IRIS`: Comma-separated named graph URIs |
| 47 | +- `GRAPHDB_USERNAME/PASSWORD`: Authentication |
| 48 | +- `FAIL_ON_AMBIGUOUS_LABELS`: Boolean for strict validation |
| 49 | + |
| 50 | +## Project-Specific Conventions |
| 51 | + |
| 52 | +### RDF Handling |
| 53 | +- Use RDFLib for all RDF operations |
| 54 | +- Preserve original triples when adding transformed versions |
| 55 | +- Add provenance with `thingOf` predicate linking IRI to original literal |
| 56 | +- Support multiple serialization formats via FastAPI form parameters |
| 57 | + |
| 58 | +### Error Handling |
| 59 | +- Validate ontology loading with comprehensive logging |
| 60 | +- Handle SPARQL connection failures gracefully |
| 61 | +- Log all transformation decisions with `TransformationLog` |
| 62 | + |
| 63 | +### File Structure |
| 64 | +``` |
| 65 | +src/strings2things/app/ |
| 66 | +├── main.py # FastAPI app entry point |
| 67 | +├── config.py # Pydantic settings |
| 68 | +├── api/endpoints.py # REST API routes |
| 69 | +├── core/ |
| 70 | +│ ├── ontology_manager.py # SPARQL loading & label mapping |
| 71 | +│ ├── rdf_transformer.py # Core transformation logic |
| 72 | +│ └── transformation_log.py # Audit trail |
| 73 | +└── utils/rdf_utils.py # RDF parsing/serialization |
| 74 | +``` |
| 75 | + |
| 76 | +### Dependencies |
| 77 | +- Uses `uv` for Python package management |
| 78 | +- RDF: `rdflib`, `SPARQLWrapper` |
| 79 | +- Web: `fastapi`, `python-multipart` |
| 80 | +- Fuzzy matching: `rapidfuzz` |
| 81 | +- Validation: `pydantic`, `pydantic-settings` |
| 82 | + |
| 83 | +### Nix Integration |
| 84 | +Project uses Nix flakes for reproducible environments. All commands should run through `just` which wraps Nix development shells. The `justfile` contains the canonical build/test/format commands. |
| 85 | + |
| 86 | +## Key Implementation Notes |
| 87 | +- Ontologies define enumeration classes with `rdfs:label` values for matching |
| 88 | +- The imaging ontology (`examples/ontologies/ontology.ttl`) shows the expected structure |
| 89 | +- Fuzzy matching is optional and configurable per-request |
| 90 | +- All transformations preserve backward compatibility by keeping original triples |
0 commit comments