Spike: evaluate RDF/SPARQL as a unifying query layer over the dependency, domain-semantics, and value-binding graphs

> **⚠️ Read [this canonical brief](https://github.com/camunda/api-test-generator/issues/60#issuecomment-4340162055) first.**
> It supersedes the original framing below and the discussion comments. Use it as the input to any agent executing this spike. The remaining content on this issue is preserved as decision history.

---

<details>
<summary>Original issue framing (historical — see canonical brief above)</summary>

## Spike: evaluate RDF/SPARQL as a unifying query layer

### Motivation

The repo currently maintains three graph-shaped data structures that are traversed independently in TypeScript with no shared query layer:

1. **Operation dependency graph** (`semantic-graph-extractor/operation-dependency-graph.json`) — ~144 operations, ~6,291 edges across 18 semantic types.
2. **Domain-semantics graph** (`path-analyser/domain-semantics.json`) — hand-curated identifiers, capabilities, runtime states, artifact kinds.
3. **Value-binding graph** (embedded in `domain-semantics.json` under `operationRequirements[].valueBindings`) — string field paths like `response.deployments[].processDefinition.processDefinitionId` mapped to semantic-type identity fields.

Each tool (`path-analyser`, `request-validation`, `optional-responses`) reloads and re-traverses these structures with bespoke procedural code. Cross-tool questions ("which operations have ≥1 positive scenario AND coverage in every applicable negative kind?") are currently impossible without ad-hoc joining of disparate JSON outputs.

### Hypothesis

A small in-process triple store (Oxigraph WASM, or `n3` + `quadstore`) loaded with these three graphs as RDF could:

- Collapse the three graphs into one queryable surface.
- Replace some procedural BFS heuristics with declarative SPARQL property paths.
- Make value-binding paths structurally validatable (typo'd paths return zero results instead of silently no-oping at runtime).
- Enable cross-tool coverage correlation queries.

### Likely non-wins (to be confirmed)

- **Scenario *generation*** is combinatorial enumeration, not query — SPARQL is the wrong tool. Mini-Datalog / ASP would fit better, but the cost-benefit is much weaker.
- **Performance** is not currently a pain point; round-tripping through SPARQL adds latency at this scale.
- **Determinism** — the current pipeline is byte-reproducible via `TEST_SEED`. Triple stores have non-deterministic iteration; would require `ORDER BY` discipline and stable URI minting. Real cost.
- **Tooling friction** — strict TS, Biome, GritQL-banned `as T` casts, sync code style. JS RDF libraries have typing gaps and async query APIs.
- **Authoring burden** — `domain-semantics.json` is hand-maintained; serialization format (JSON vs Turtle) doesn't reduce that burden.

### Spike scope (timeboxed)

Pick **one** concrete query that is currently awkward in TS and implement it twice:

1. **Baseline (TS)**: e.g. *"find all operations whose required semantic types have no producer in the dependency graph"* (correctness check) **or** *"for each operation, list which value-binding field paths no longer resolve against the current bundled spec"* (drift detector).
2. **SPARQL**: load the same inputs into an embedded triple store, express the query in SPARQL, compare:
   - Lines of code / clarity
   - Cold-start + query latency
   - Reproducibility under fixed seed
   - Type safety at the JS/RDF boundary

### Decision criteria

- **Adopt** if the SPARQL version is materially clearer AND the ergonomics overhead (load, serialize, type the boundary) is acceptable for the value-binding drift use case at minimum.
- **Adopt narrowly** (value-binding graph only, leave dependency graph as-is) if the win is concentrated there.
- **Reject** if it's a wash or worse — conclusion is "graph traversal scale is appropriate to in-memory TS; revisit if scale grows ≥10×".

### Out of scope for the spike

- Rewriting the BFS scenario planner.
- Replacing `domain-semantics.json` authoring format.
- Introducing a persistent triple store (in-process only).
- SHACL validation of the spec (separate question).

### Deliverables

- A throwaway branch with both implementations of the chosen query.
- Short writeup (in the issue or a `docs/spikes/` note) recording: query chosen, LOC delta, latency numbers, friction points, recommendation.

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spike: evaluate RDF/SPARQL as a unifying query layer over the dependency, domain-semantics, and value-binding graphs #60

Spike: evaluate RDF/SPARQL as a unifying query layer

Motivation

Hypothesis

Likely non-wins (to be confirmed)

Spike scope (timeboxed)

Decision criteria

Out of scope for the spike

Deliverables

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Spike: evaluate RDF/SPARQL as a unifying query layer over the dependency, domain-semantics, and value-binding graphs #60

Description

Spike: evaluate RDF/SPARQL as a unifying query layer

Motivation

Hypothesis

Likely non-wins (to be confirmed)

Spike scope (timeboxed)

Decision criteria

Out of scope for the spike

Deliverables

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions