feat(stores): canonical DSL contains operator — Track G G3#188
Open
ronsse wants to merge 1 commit into
Open
Conversation
Adds a new `contains` filter operator that asks "scalar value is a
member of the list-typed property at <field>" — the missing piece
G2's searchability recipe needs to express
`FilterClause("properties.column_names", "contains", "user_id")`
without hitting the silent-zero-rows bug across SQLite / Neo4j (and
the false-positive on Postgres' bare `@>`).
Per-backend semantics, all converging on the same contract:
- SQLite: `EXISTS (SELECT 1 FROM json_each(<safe_array>) WHERE ...)`
where `safe_array` is a `CASE` guard that coerces non-array values
to `'[]'`. Uses `json_type(properties_json, '$.key')` path-form
rather than nested `json_type(json_extract(...))` — the latter
raises "malformed JSON" on scalar string properties.
- Postgres: `jsonb_typeof(properties->'key') = 'array' AND properties
@> '{"key": "value"}'::jsonb`. The `jsonb_typeof` guard closes
JSONB's bare-`@>` false-positive where scalar-equals-scalar matches.
- Neo4j / BoltOpenCypher: client-side `isinstance(prop, list) and
value in prop`, applied after JSON decode of `properties_json`.
- ArcadeDB: inherits cleanly from BoltOpenCypher — no overrides.
Top-level columns (`node_type`, edge provenance, etc.) are scalar
in every backend; `contains` against them raises `ValueError` rather
than silently degrading to no-op.
Validator: `FilterClause(op="contains", value=<scalar>)` requires a
single str/int/float/bool — tuples (the `in` shape), lists, dicts,
and `None` are rejected at construction. `bool` is accepted here
(unlike the range ops) because a list-of-booleans is a legitimate
property shape.
Contract coverage in `graph_store_contract.py` (runs against every
backend the env wires up): list-match, single-element list, no-match,
scalar property skipped, missing property skipped, AND-composition
with other filters, integer values.
Out of scope (deferred to G2 follow-up):
- `GraphSearch.search()` ergonomics — surfacing `contains` through
the higher-level filter dict.
- Live Postgres / Neo4j coverage — env-gated contract tests run
whenever `TRELLIS_TEST_PG_DSN` / Bolt creds are loaded; this PR's
CI hits only the SQLite contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
containsoperator to canonical DSL (FilterOpinsrc/trellis/stores/base/graph_query.py). Semantics: scalar value is a member of the list-typed property at the field path.EXISTS (SELECT 1 FROM json_each(...) WHERE value = ?)pattern (mirroringsrc/trellis/stores/sqlite/document.py:55); CASE guard handles non-array properties safely.jsonb_typeof(properties->'key') = 'array' AND properties @> '{"key":value}'::jsonb— the explicitjsonb_typeofguard prevents the array-special-case from silently treating scalar-equals-scalar as a containment hit.isinstance(prop, list) and value in prop, consistent with existingeq/in/existsevaluation onproperties.*(where properties are stored as JSON-stringifiedproperties_json).BoltOpenCypherGraphStore; no overrides needed.FilterClause:containsrequires a single scalar value (str/int/float/bool); rejects tuple/list/None/dict.tests/unit/stores/contracts/graph_store_contract.py— parametrize over backends; SQLite runs by default, Postgres/Neo4j/ArcadeDB run when their env-gated fixtures fire.Test plan
pytest tests/unit/stores/ -v→ 775 passed, 13 skipped, 279 deselectedpytest tests/unit/ -q(regression check) → 3500 passed, 14 skipped, 279 deselectedruff check src/ tests/ eval/→ All checks passedmypy src/→ no issues in 241 source filesTRELLIS_TEST_PG_DSN) — Postgres compile path verified by code review; live confirmation deferred until env loadedTRELLIS_TEST_NEO4J_*) — Cypher client-side path verified locally; live confirmation deferred until env loadedtests/unit/retrieve/test_strategies_column_search.pyxfail to a passing test after this merges🤖 Generated with Claude Code