Skip to content

ART Index: Support compound key scans#8

Merged
mach-kernel merged 9 commits into
spiceai-1.4.1from
mach/compound-art-scan
Nov 3, 2025
Merged

ART Index: Support compound key scans#8
mach-kernel merged 9 commits into
spiceai-1.4.1from
mach/compound-art-scan

Conversation

@mach-kernel

@mach-kernel mach-kernel commented Nov 3, 2025

Copy link
Copy Markdown

🗣 Description

  • The ART index already supports compound keys in order to enforce uniqueness constraints.
  • ARTKey has a concatenation mechanism that it uses to generate one value key for multiple column exprs.
  • This wires it up for the query side, with
    • ARTIndexCompoundKeyScanState, which is the same as ARTIndexScanState but with vectors for an arbitrary number of values
    • ART::CompoundKeyScan, which generates the concatenated key, locks, does an equality scan
  • TryScanIndex column binding logic updated with stricter sanity checks and a new unit test
D explain analyze select * from test where x = '525' and y = '2525';
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││    Query Profiling Information    ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
explain analyze select * from test where x = '525' and y = '2525';
┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││              Total Time: 0.0251s             ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│           QUERY           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      EXPLAIN_ANALYZE      │
│    ────────────────────   │
│           0 rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         TABLE_SCAN        │
│    ────────────────────   │
│        Table: test        │
│      Type: Index Scan     │
│                           │
│        Projections:       │
│             x             │
│             y             │
│             z             │
│                           │
│          Filters:         │
│          x='525'          │
│          y='2525'         │
│                           │
│           1 row           │
│          (0.00s)          │
└───────────────────────────┘

For completeness (break before passing back to TableScanInitGlobal
image


Benchmark on 7.5M rows. Both have composite index on those columns, but vanilla DuckDB won't push the scan down like this branch does:

With index:

D select * from __data_tdata_1762187237439 where Data2 = '+11118822328' and Data3 = 'D3';
┌──────────────────────┬──────────────────────┬──────────────────────┬───┬───────┬──────────────────────┬───────┬────────┐
│     DateCreated      │     DateUpdated      │        Field1        │ … │ Data7 │        Data8         │ Data9 │ Data10 │
│     timestamp_ns     │     timestamp_ns     │       varchar        │   │ int64 │       varchar        │ int64 │ int64  │
├──────────────────────┼──────────────────────┼──────────────────────┼───┼───────┼──────────────────────┼───────┼────────┤
│ 2024-07-25 13:51:4…  │ 2024-09-27 13:51:5…  │ F1b61faec5-3aed-49…  │ … │   0   │ D80d395e36-9a00-41…  │  821  │   1    │
├──────────────────────┴──────────────────────┴──────────────────────┴───┴───────┴──────────────────────┴───────┴────────┤
│ 1 rows                                                                                            15 columns (7 shown) │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Run Time (s): real 0.037 user 0.066688 sys 0.008183

No index:

D select * from __data_tdata_1762187237439 where Data2 = '+11118822328' and Data3 = 'D3';
┌──────────────────────┬──────────────────────┬──────────────────────┬───┬───────┬──────────────────────┬───────┬────────┐
│     DateCreated      │     DateUpdated      │        Field1        │ … │ Data7 │        Data8         │ Data9 │ Data10 │
│     timestamp_ns     │     timestamp_ns     │       varchar        │   │ int64 │       varchar        │ int64 │ int64  │
├──────────────────────┼──────────────────────┼──────────────────────┼───┼───────┼──────────────────────┼───────┼────────┤
│ 2024-07-25 13:51:4…  │ 2024-09-27 13:51:5…  │ F1b61faec5-3aed-49…  │ … │   0   │ D80d395e36-9a00-41…  │  821  │   1    │
├──────────────────────┴──────────────────────┴──────────────────────┴───┴───────┴──────────────────────┴───────┴────────┤
│ 1 rows                                                                                            15 columns (7 shown) │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Run Time (s): real 0.282 user 3.410015 sys 0.047764

rebind projected columns in ALL index exprs
do not bail out early if more than one index expr
hook up composite key scan
…s we made earlier), fix single-ref-per-expr predicate to correctly walk expr tree and yank refs (allowing nesting in fns, etc)
…eed remapping if the scan is not a view scan
…es in both sanity check and index expr rebinding

add test for this scenario
@mach-kernel

Copy link
Copy Markdown
Author

This is an update of #7 with refactored column binding logic + additional tests for the new binding mechanism.

@mach-kernel mach-kernel merged commit d82fa82 into spiceai-1.4.1 Nov 3, 2025
26 of 29 checks passed
mach-kernel added a commit that referenced this pull request Nov 3, 2025
ART Index: Support compound key scans

Squashed commit of the following:

commit fec3602
Author: David Stancu <david@spice.ai>
Date:   Mon Nov 3 14:26:06 2025 -0500

    tryscanindex: fix direct match lookup, range check vec access

commit 2714c3d
Author: David Stancu <david@spice.ai>
Date:   Mon Nov 3 13:55:13 2025 -0500

    tryscanindex: do column matching first, to use possibly rebound matches in both sanity check and index expr rebinding
    add test for this scenario

commit 36ffa5b
Author: David Stancu <david@spice.ai>
Date:   Mon Nov 3 12:30:28 2025 -0500

    tryscanindex sanity check: indexed_columns / art column ids may not need remapping if the scan is not a view scan

commit 525f9c7
Author: David Stancu <david@spice.ai>
Date:   Thu Oct 30 10:42:17 2025 -0400

    do not do index scan if there are other non index filters in the predicate (fix shutdown_create_index.test)

commit b0a6e2d
Author: David Stancu <david@spice.ai>
Date:   Thu Oct 30 10:04:54 2025 -0400

    add test, bail out for eg composite query with IN () list

commit a22a430
Author: David Stancu <david@spice.ai>
Date:   Wed Oct 29 16:37:30 2025 -0400

    simplify filter expression storage index bindings (just reuse the ones we made earlier), fix single-ref-per-expr predicate to correctly walk expr tree and yank refs (allowing nesting in fns, etc)

commit 9c8c1ed
Author: David Stancu <david@spice.ai>
Date:   Wed Oct 29 15:11:23 2025 -0400

    copy index expressions before rewriting column refs

commit aff2c98
Author: David Stancu <david@spice.ai>
Date:   Wed Oct 29 14:36:33 2025 -0400

    table scan:

    rebind projected columns in ALL index exprs
    do not bail out early if more than one index expr
    hook up composite key scan

commit bfc6f02
Author: David Stancu <david@spice.ai>
Date:   Wed Oct 29 14:35:09 2025 -0400

    make specialized compound key scan state for eq compares, specialized scan using ARTKey::Concat
peasee pushed a commit that referenced this pull request Jan 17, 2026
ART Index: Support compound key scans

Squashed commit of the following:

commit fec3602
Author: David Stancu <david@spice.ai>
Date:   Mon Nov 3 14:26:06 2025 -0500

    tryscanindex: fix direct match lookup, range check vec access

commit 2714c3d
Author: David Stancu <david@spice.ai>
Date:   Mon Nov 3 13:55:13 2025 -0500

    tryscanindex: do column matching first, to use possibly rebound matches in both sanity check and index expr rebinding
    add test for this scenario

commit 36ffa5b
Author: David Stancu <david@spice.ai>
Date:   Mon Nov 3 12:30:28 2025 -0500

    tryscanindex sanity check: indexed_columns / art column ids may not need remapping if the scan is not a view scan

commit 525f9c7
Author: David Stancu <david@spice.ai>
Date:   Thu Oct 30 10:42:17 2025 -0400

    do not do index scan if there are other non index filters in the predicate (fix shutdown_create_index.test)

commit b0a6e2d
Author: David Stancu <david@spice.ai>
Date:   Thu Oct 30 10:04:54 2025 -0400

    add test, bail out for eg composite query with IN () list

commit a22a430
Author: David Stancu <david@spice.ai>
Date:   Wed Oct 29 16:37:30 2025 -0400

    simplify filter expression storage index bindings (just reuse the ones we made earlier), fix single-ref-per-expr predicate to correctly walk expr tree and yank refs (allowing nesting in fns, etc)

commit 9c8c1ed
Author: David Stancu <david@spice.ai>
Date:   Wed Oct 29 15:11:23 2025 -0400

    copy index expressions before rewriting column refs

commit aff2c98
Author: David Stancu <david@spice.ai>
Date:   Wed Oct 29 14:36:33 2025 -0400

    table scan:

    rebind projected columns in ALL index exprs
    do not bail out early if more than one index expr
    hook up composite key scan

commit bfc6f02
Author: David Stancu <david@spice.ai>
Date:   Wed Oct 29 14:35:09 2025 -0400

    make specialized compound key scan state for eq compares, specialized scan using ARTKey::Concat
peasee pushed a commit that referenced this pull request Feb 1, 2026
ART Index: Support compound key scans

Squashed commit of the following:

commit fec3602
Author: David Stancu <david@spice.ai>
Date:   Mon Nov 3 14:26:06 2025 -0500

    tryscanindex: fix direct match lookup, range check vec access

commit 2714c3d
Author: David Stancu <david@spice.ai>
Date:   Mon Nov 3 13:55:13 2025 -0500

    tryscanindex: do column matching first, to use possibly rebound matches in both sanity check and index expr rebinding
    add test for this scenario

commit 36ffa5b
Author: David Stancu <david@spice.ai>
Date:   Mon Nov 3 12:30:28 2025 -0500

    tryscanindex sanity check: indexed_columns / art column ids may not need remapping if the scan is not a view scan

commit 525f9c7
Author: David Stancu <david@spice.ai>
Date:   Thu Oct 30 10:42:17 2025 -0400

    do not do index scan if there are other non index filters in the predicate (fix shutdown_create_index.test)

commit b0a6e2d
Author: David Stancu <david@spice.ai>
Date:   Thu Oct 30 10:04:54 2025 -0400

    add test, bail out for eg composite query with IN () list

commit a22a430
Author: David Stancu <david@spice.ai>
Date:   Wed Oct 29 16:37:30 2025 -0400

    simplify filter expression storage index bindings (just reuse the ones we made earlier), fix single-ref-per-expr predicate to correctly walk expr tree and yank refs (allowing nesting in fns, etc)

commit 9c8c1ed
Author: David Stancu <david@spice.ai>
Date:   Wed Oct 29 15:11:23 2025 -0400

    copy index expressions before rewriting column refs

commit aff2c98
Author: David Stancu <david@spice.ai>
Date:   Wed Oct 29 14:36:33 2025 -0400

    table scan:

    rebind projected columns in ALL index exprs
    do not bail out early if more than one index expr
    hook up composite key scan

commit bfc6f02
Author: David Stancu <david@spice.ai>
Date:   Wed Oct 29 14:35:09 2025 -0400

    make specialized compound key scan state for eq compares, specialized scan using ARTKey::Concat
sgrebnov pushed a commit that referenced this pull request Jun 1, 2026
ART Index: Support compound key scans

Squashed commit of the following:

commit fec3602
Author: David Stancu <david@spice.ai>
Date:   Mon Nov 3 14:26:06 2025 -0500

    tryscanindex: fix direct match lookup, range check vec access

commit 2714c3d
Author: David Stancu <david@spice.ai>
Date:   Mon Nov 3 13:55:13 2025 -0500

    tryscanindex: do column matching first, to use possibly rebound matches in both sanity check and index expr rebinding
    add test for this scenario

commit 36ffa5b
Author: David Stancu <david@spice.ai>
Date:   Mon Nov 3 12:30:28 2025 -0500

    tryscanindex sanity check: indexed_columns / art column ids may not need remapping if the scan is not a view scan

commit 525f9c7
Author: David Stancu <david@spice.ai>
Date:   Thu Oct 30 10:42:17 2025 -0400

    do not do index scan if there are other non index filters in the predicate (fix shutdown_create_index.test)

commit b0a6e2d
Author: David Stancu <david@spice.ai>
Date:   Thu Oct 30 10:04:54 2025 -0400

    add test, bail out for eg composite query with IN () list

commit a22a430
Author: David Stancu <david@spice.ai>
Date:   Wed Oct 29 16:37:30 2025 -0400

    simplify filter expression storage index bindings (just reuse the ones we made earlier), fix single-ref-per-expr predicate to correctly walk expr tree and yank refs (allowing nesting in fns, etc)

commit 9c8c1ed
Author: David Stancu <david@spice.ai>
Date:   Wed Oct 29 15:11:23 2025 -0400

    copy index expressions before rewriting column refs

commit aff2c98
Author: David Stancu <david@spice.ai>
Date:   Wed Oct 29 14:36:33 2025 -0400

    table scan:

    rebind projected columns in ALL index exprs
    do not bail out early if more than one index expr
    hook up composite key scan

commit bfc6f02
Author: David Stancu <david@spice.ai>
Date:   Wed Oct 29 14:35:09 2025 -0400

    make specialized compound key scan state for eq compares, specialized scan using ARTKey::Concat
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant