Skip to content

feat(stores): canonical DSL contains operator — Track G G3#188

Open
ronsse wants to merge 1 commit into
mainfrom
worktree-agent-a1dd6da221f66828f
Open

feat(stores): canonical DSL contains operator — Track G G3#188
ronsse wants to merge 1 commit into
mainfrom
worktree-agent-a1dd6da221f66828f

Conversation

@ronsse

@ronsse ronsse commented May 18, 2026

Copy link
Copy Markdown
Owner

Summary

  • Adds contains operator to canonical DSL (FilterOp in src/trellis/stores/base/graph_query.py). Semantics: scalar value is a member of the list-typed property at the field path.
  • Per-backend compilers across all 4 graph backends:
    • SQLite: EXISTS (SELECT 1 FROM json_each(...) WHERE value = ?) pattern (mirroring src/trellis/stores/sqlite/document.py:55); CASE guard handles non-array properties safely.
    • Postgres: jsonb_typeof(properties->'key') = 'array' AND properties @> '{"key":value}'::jsonb — the explicit jsonb_typeof guard prevents the array-special-case from silently treating scalar-equals-scalar as a containment hit.
    • Neo4j / BoltOpenCypher: client-side isinstance(prop, list) and value in prop, consistent with existing eq / in / exists evaluation on properties.* (where properties are stored as JSON-stringified properties_json).
    • ArcadeDB: inherits cleanly from BoltOpenCypherGraphStore; no overrides needed.
  • Pydantic validator on FilterClause: contains requires a single scalar value (str/int/float/bool); rejects tuple/list/None/dict.
  • Contract tests in shared tests/unit/stores/contracts/graph_store_contract.py — parametrize over backends; SQLite runs by default, Postgres/Neo4j/ArcadeDB run when their env-gated fixtures fire.
  • Closes the H-severity DSL gap surfaced by Track G Unit G2; G2 rebases on top of this and flips its xfail test to a passing test once this merges.

Test plan

  • pytest tests/unit/stores/ -v775 passed, 13 skipped, 279 deselected
  • pytest tests/unit/ -q (regression check) → 3500 passed, 14 skipped, 279 deselected
  • ruff check src/ tests/ eval/ → All checks passed
  • mypy src/ → no issues in 241 source files
  • Live Postgres run (requires TRELLIS_TEST_PG_DSN) — Postgres compile path verified by code review; live confirmation deferred until env loaded
  • Live Neo4j run (requires TRELLIS_TEST_NEO4J_*) — Cypher client-side path verified locally; live confirmation deferred until env loaded
  • Track G G2 rebase: flip tests/unit/retrieve/test_strategies_column_search.py xfail to a passing test after this merges

🤖 Generated with Claude Code

Adds a new `contains` filter operator that asks "scalar value is a
member of the list-typed property at <field>" — the missing piece
G2's searchability recipe needs to express
`FilterClause("properties.column_names", "contains", "user_id")`
without hitting the silent-zero-rows bug across SQLite / Neo4j (and
the false-positive on Postgres' bare `@>`).

Per-backend semantics, all converging on the same contract:

- SQLite: `EXISTS (SELECT 1 FROM json_each(<safe_array>) WHERE ...)`
  where `safe_array` is a `CASE` guard that coerces non-array values
  to `'[]'`.  Uses `json_type(properties_json, '$.key')` path-form
  rather than nested `json_type(json_extract(...))` — the latter
  raises "malformed JSON" on scalar string properties.
- Postgres: `jsonb_typeof(properties->'key') = 'array' AND properties
  @> '{"key": "value"}'::jsonb`.  The `jsonb_typeof` guard closes
  JSONB's bare-`@>` false-positive where scalar-equals-scalar matches.
- Neo4j / BoltOpenCypher: client-side `isinstance(prop, list) and
  value in prop`, applied after JSON decode of `properties_json`.
- ArcadeDB: inherits cleanly from BoltOpenCypher — no overrides.

Top-level columns (`node_type`, edge provenance, etc.) are scalar
in every backend; `contains` against them raises `ValueError` rather
than silently degrading to no-op.

Validator: `FilterClause(op="contains", value=<scalar>)` requires a
single str/int/float/bool — tuples (the `in` shape), lists, dicts,
and `None` are rejected at construction.  `bool` is accepted here
(unlike the range ops) because a list-of-booleans is a legitimate
property shape.

Contract coverage in `graph_store_contract.py` (runs against every
backend the env wires up): list-match, single-element list, no-match,
scalar property skipped, missing property skipped, AND-composition
with other filters, integer values.

Out of scope (deferred to G2 follow-up):
- `GraphSearch.search()` ergonomics — surfacing `contains` through
  the higher-level filter dict.
- Live Postgres / Neo4j coverage — env-gated contract tests run
  whenever `TRELLIS_TEST_PG_DSN` / Bolt creds are loaded; this PR's
  CI hits only the SQLite contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant