Use JSONB containment for GIN-friendly EQ filters by sambhav · Pull Request #3646 · Kinto/kinto

sambhav · 2026-02-15T20:58:12Z

Summary

Rewrites _format_conditions to emit data @> '{"field": value}' (JSONB containment) instead of data->'field' = 'value'::jsonb for EQ filters on scalar data fields (str, int, float, bool, None)
Rewrites CONTAINS filters to use top-level containment (data @> '{"field": [values]}') instead of sub-expression containment (data->'field' @> '[values]'), removing the now-redundant jsonb_typeof guard
Array/object EQ values keep using arrow extraction to preserve exact equality semantics (@> uses superset matching for non-scalars)
Normalizes _format_sorting JSONB accessor expressions to match _format_conditions format (removes redundant parentheses)
Documents the recommended GIN index in the Storage class docstring

Why this matters

The @> containment operator is the only JSONB operator that GIN indexes accelerate. Previously, EQ used data->'field' = value and CONTAINS used data->'field' @> value — neither form can use a GIN index on data. By rewriting both to top-level data @> '{"field": value}', a single GIN index accelerates all equality and array-contains queries across all collections and fields:

CREATE INDEX CONCURRENTLY idx_objects_data_gin
    ON objects USING gin (data jsonb_path_ops)
    WHERE NOT deleted;

Without this index, the query rewrites have zero performance impact — they're semantically equivalent to the old form. The index is intentionally not auto-created; it's documented as an optional optimization for large deployments.

What the GIN index accelerates:

?status=active → data @> '{"status": "active"}'
?person.name=Alice → data @> '{"person": {"name": "Alice"}}'
?contains_colors=red → data @> '{"colors": ["red"]}'

What it does NOT accelerate:

Range filters (min_, max_, gt_, lt_)
LIKE/text search
contains_any_ (uses && array overlap)
Sorting on JSONB fields

Test plan

19 new unit tests for SQL generation (EQ scalars, EQ arrays/objects, CONTAINS, CONTAINS_ANY, nested fields, non-EQ operators, id/modified exclusion, sorting normalization)
2 new integration tests for array/object exact equality
All 183 existing non-PostgreSQL storage tests pass
All 64 filter/sort resource tests pass

🤖 Generated with Claude Code

Rewrite _format_conditions to emit `data @> '{"field": value}'` instead of `data->'field' = 'value'::jsonb` for equality filters on scalar data fields (str, int, float, bool, None). This is semantically equivalent for scalars but enables GIN index acceleration when a `gin(data jsonb_path_ops)` index exists on the objects table. Array and object EQ values still use the arrow extraction path to preserve exact equality semantics (containment uses superset matching for non-scalars). Also normalizes _format_sorting JSONB accessor expressions to match _format_conditions format (removing redundant parentheses around placeholders), ensuring expression indexes work for both filter and sort queries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Rewrite CONTAINS filters to use `data @> '{"field": [values]}'` instead of `data->'field' @> '[values]'`. Top-level containment allows a GIN index on the data column to accelerate these queries. The jsonb_typeof guard is no longer needed since containment already returns false when the field is not an array. Add documentation to the Storage class docstring describing the recommended GIN index, what it accelerates, what it doesn't, and approximate sizing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Expand the GIN index documentation with three options: 1. Recommended: partial index with WHERE resource_name = 'record' (smallest, scoped to actual records, works with psycopg2) 2. Basic: partial index with WHERE NOT deleted only (driver-independent fallback) 3. Composite: btree_gin extension with parent_id + resource_name in the GIN index (single index scan, no BitmapAnd needed) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add tests covering: - CONTAINS fallback for id/modified fields (covers dead-code branch) - EQ with falsy scalars: None, empty string, 0, False - EQ with deeply nested fields (a.b.c) - EQ with empty arrays/objects (must NOT use containment) - CONTAINS with numeric arrays and object elements - CONTAINS with nested fields These tests verify that the @> rewrite produces identical behavior to the old arrow extraction form across all data types and edge cases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

leplatrem

Thank you

This seems useful indeed.

I left some comments/questions.
I was thinking we could ship the indexation migrations with this pull-request.
But we could indeed do it in several steps:

Merge this
Deploy
Create indexes manually on DB
Validate improvements
Create another PR that ships these indexes as migrations

WDYT?

leplatrem · 2026-02-17T16:10:26Z