⚠️ Read this canonical brief first.
It supersedes the original framing below and the discussion comments. Use it as the input to any agent executing this spike. The remaining content on this issue is preserved as decision history.
Original issue framing (historical — see canonical brief above)
Spike: evaluate RDF/SPARQL as a unifying query layer
Motivation
The repo currently maintains three graph-shaped data structures that are traversed independently in TypeScript with no shared query layer:
- Operation dependency graph (
semantic-graph-extractor/operation-dependency-graph.json) — ~144 operations, ~6,291 edges across 18 semantic types.
- Domain-semantics graph (
path-analyser/domain-semantics.json) — hand-curated identifiers, capabilities, runtime states, artifact kinds.
- Value-binding graph (embedded in
domain-semantics.json under operationRequirements[].valueBindings) — string field paths like response.deployments[].processDefinition.processDefinitionId mapped to semantic-type identity fields.
Each tool (path-analyser, request-validation, optional-responses) reloads and re-traverses these structures with bespoke procedural code. Cross-tool questions ("which operations have ≥1 positive scenario AND coverage in every applicable negative kind?") are currently impossible without ad-hoc joining of disparate JSON outputs.
Hypothesis
A small in-process triple store (Oxigraph WASM, or n3 + quadstore) loaded with these three graphs as RDF could:
- Collapse the three graphs into one queryable surface.
- Replace some procedural BFS heuristics with declarative SPARQL property paths.
- Make value-binding paths structurally validatable (typo'd paths return zero results instead of silently no-oping at runtime).
- Enable cross-tool coverage correlation queries.
Likely non-wins (to be confirmed)
- Scenario generation is combinatorial enumeration, not query — SPARQL is the wrong tool. Mini-Datalog / ASP would fit better, but the cost-benefit is much weaker.
- Performance is not currently a pain point; round-tripping through SPARQL adds latency at this scale.
- Determinism — the current pipeline is byte-reproducible via
TEST_SEED. Triple stores have non-deterministic iteration; would require ORDER BY discipline and stable URI minting. Real cost.
- Tooling friction — strict TS, Biome, GritQL-banned
as T casts, sync code style. JS RDF libraries have typing gaps and async query APIs.
- Authoring burden —
domain-semantics.json is hand-maintained; serialization format (JSON vs Turtle) doesn't reduce that burden.
Spike scope (timeboxed)
Pick one concrete query that is currently awkward in TS and implement it twice:
- Baseline (TS): e.g. "find all operations whose required semantic types have no producer in the dependency graph" (correctness check) or "for each operation, list which value-binding field paths no longer resolve against the current bundled spec" (drift detector).
- SPARQL: load the same inputs into an embedded triple store, express the query in SPARQL, compare:
- Lines of code / clarity
- Cold-start + query latency
- Reproducibility under fixed seed
- Type safety at the JS/RDF boundary
Decision criteria
- Adopt if the SPARQL version is materially clearer AND the ergonomics overhead (load, serialize, type the boundary) is acceptable for the value-binding drift use case at minimum.
- Adopt narrowly (value-binding graph only, leave dependency graph as-is) if the win is concentrated there.
- Reject if it's a wash or worse — conclusion is "graph traversal scale is appropriate to in-memory TS; revisit if scale grows ≥10×".
Out of scope for the spike
- Rewriting the BFS scenario planner.
- Replacing
domain-semantics.json authoring format.
- Introducing a persistent triple store (in-process only).
- SHACL validation of the spec (separate question).
Deliverables
- A throwaway branch with both implementations of the chosen query.
- Short writeup (in the issue or a
docs/spikes/ note) recording: query chosen, LOC delta, latency numbers, friction points, recommendation.
Original issue framing (historical — see canonical brief above)
Spike: evaluate RDF/SPARQL as a unifying query layer
Motivation
The repo currently maintains three graph-shaped data structures that are traversed independently in TypeScript with no shared query layer:
semantic-graph-extractor/operation-dependency-graph.json) — ~144 operations, ~6,291 edges across 18 semantic types.path-analyser/domain-semantics.json) — hand-curated identifiers, capabilities, runtime states, artifact kinds.domain-semantics.jsonunderoperationRequirements[].valueBindings) — string field paths likeresponse.deployments[].processDefinition.processDefinitionIdmapped to semantic-type identity fields.Each tool (
path-analyser,request-validation,optional-responses) reloads and re-traverses these structures with bespoke procedural code. Cross-tool questions ("which operations have ≥1 positive scenario AND coverage in every applicable negative kind?") are currently impossible without ad-hoc joining of disparate JSON outputs.Hypothesis
A small in-process triple store (Oxigraph WASM, or
n3+quadstore) loaded with these three graphs as RDF could:Likely non-wins (to be confirmed)
TEST_SEED. Triple stores have non-deterministic iteration; would requireORDER BYdiscipline and stable URI minting. Real cost.as Tcasts, sync code style. JS RDF libraries have typing gaps and async query APIs.domain-semantics.jsonis hand-maintained; serialization format (JSON vs Turtle) doesn't reduce that burden.Spike scope (timeboxed)
Pick one concrete query that is currently awkward in TS and implement it twice:
Decision criteria
Out of scope for the spike
domain-semantics.jsonauthoring format.Deliverables
docs/spikes/note) recording: query chosen, LOC delta, latency numbers, friction points, recommendation.