Skip to content

Update datafusion#10422

Merged
Jeadie merged 39 commits into
trunkfrom
jeadie/26-04-20/df-increment
Apr 28, 2026
Merged

Update datafusion#10422
Jeadie merged 39 commits into
trunkfrom
jeadie/26-04-20/df-increment

Conversation

@Jeadie
Copy link
Copy Markdown
Contributor

@Jeadie Jeadie commented Apr 20, 2026

📝 Summary

Update datafusion to include:

  • Implement ExecutionPlan::fetch and ExecutionPlan::with_fetch for ProjectionExec
    #155
  • Guard ProjectionExec::try_pushdown_sort against stale name/index metadata
    #156

For # 156 we also need to fix a planning bug where rrf(...) queries using vector_search(...) with recency settings (time_column, decay args) could produce an invalid physical plan (physical_plan_error: SanityCheckPlan) due to a stale _score@idx sort reference after optimizer rewrites.

@Jeadie Jeadie self-assigned this Apr 20, 2026
@Jeadie Jeadie requested a review from a team as a code owner April 20, 2026 11:08
Copilot AI review requested due to automatic review settings April 20, 2026 11:08
@Jeadie Jeadie added kind/bug Something isn't working area/datafusion labels Apr 20, 2026
@github-actions github-actions Bot added area/config kind/dependencies Pull requests that update a dependency file size/m labels Apr 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 20, 2026

✅ Pull with Spice Passed

Passing checks:

  • ✅ Title meets minimum length requirement (10 characters)
  • ✅ No banned labels detected
  • ✅ Has a label from required category kind/
  • ✅ Has a label from required category area/
  • ✅ Has at least one assignee: Jeadie

@github-actions
Copy link
Copy Markdown
Contributor

✅ Pull with Spice Passed

🏷️ Auto-applied labels:

  • area/config
  • kind/dependencies
  • size/m

Passing checks:

  • ✅ Title meets minimum length requirement (10 characters)
  • ✅ No banned labels detected
  • ✅ Has a label from required category kind/
  • ✅ Has a label from required category area/
  • ✅ Has at least one assignee: Jeadie

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

✅ Pull with Spice Passed

🏷️ Auto-applied labels:

  • area/config
  • kind/dependencies
  • size/m

Passing checks:

  • ✅ Title meets minimum length requirement (10 characters)
  • ✅ No banned labels detected
  • ✅ Has a label from required category kind/
  • ✅ Has a label from required category area/
  • ✅ Has at least one assignee: Jeadie

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the pinned spiceai/datafusion git revision to include changes from spiceai/datafusion#155.

Changes:

  • Bump datafusion (and related datafusion-* crates) git rev in [patch.crates-io] to 2e4b04b4e9dd4b949195529cb0d9b92cb8d75eaf.
  • Regenerate Cargo.lock entries so all datafusion* packages point at the new git revision.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated no comments.

File Description
Cargo.toml Updates the pinned DataFusion fork revision across all patched datafusion* crates.
Cargo.lock Updates lockfile sources to match the new DataFusion git revision.

sgrebnov
sgrebnov previously approved these changes Apr 20, 2026
ewgenius
ewgenius previously approved these changes Apr 20, 2026
Spice Snapshot Update Bot and others added 2 commits April 20, 2026 06:21
@github-actions github-actions Bot dismissed stale reviews from ewgenius and sgrebnov via 508adb6 April 20, 2026 13:28
Spice Snapshot Update Bot and others added 2 commits April 20, 2026 13:47
Copilot AI review requested due to automatic review settings April 20, 2026 13:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 20 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings April 21, 2026 03:52
peasee
peasee previously approved these changes Apr 27, 2026
Copilot AI review requested due to automatic review settings April 27, 2026 03:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 49 out of 207 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings April 27, 2026 08:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 50 out of 209 changed files in this pull request and generated 2 comments.

Comment thread crates/runtime/src/embeddings/udtf.rs
Comment thread crates/runtime/src/search/rrf.rs
Copilot AI review requested due to automatic review settings April 27, 2026 22:55
Jeadie and others added 2 commits April 28, 2026 10:19
Co-authored-by: Spice Snapshot Update Bot <spiceaibot@spice.ai>
peasee
peasee previously approved these changes Apr 28, 2026
@Jeadie Jeadie added this pull request to the merge queue Apr 28, 2026
Merged via the queue into trunk with commit f24c163 Apr 28, 2026
66 of 67 checks passed
@Jeadie Jeadie deleted the jeadie/26-04-20/df-increment branch April 28, 2026 04:45
lukekim added a commit that referenced this pull request Apr 28, 2026
* update datafusion

* fix: Update test snapshots

* fix: Update test snapshots

* fixes for _score in vector UDTF

* remove bas snapshots

* fix: Update Search integration test snapshots

* search: stabilize vector_search score column for rrf recency plans

* revert these snapshots

* fix: Update Search integration test snapshots

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* update snapshot

* remove

* fix build

* fix: remove unused col import

The col function from datafusion_expr is no longer used in
embeddings/udtf.rs after the upstream datafusion update; the lint
job runs with -D warnings, so the unused import broke the build.

* snapshot

* update DF

* Fix DF

* fixes

* improved testing for RRF

* snapshots

* normalize remove

* fix: Update Search integration test snapshots (#10567)

Co-authored-by: Spice Snapshot Update Bot <spiceaibot@spice.ai>

* test: update insta snapshots

---------

Co-authored-by: Spice Snapshot Update Bot <spiceaibot@spice.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Evgenii Khramkov <evgenii@spice.ai>
Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: claudespice <claude@spice.ai>
pull Bot pushed a commit to TheRakeshPurohit/spiceai that referenced this pull request Apr 30, 2026
* Add self-hosted Spice connector support

* Enhance RefreshTask to handle pre-delete rows for upserts and add tests for endpoint scheme validation

* Refactor encode_data_update function to use lifetime annotations for better clarity

* Refactor data exchange handling to support streaming snapshots and improve batch encoding

* fix: enhance upsert handling with primary key validation and improve error reporting

* fix: pass dataset name as a reference in upsert pre-delete rows function

* Add DuckDB vector engine support

* Address DuckDB vector engine PR feedback

* Add DuckDB HNSW search integration coverage

* Remove DuckDB ef_search alias

* Remove DuckDB ef_construction alias

* Address DuckDB vector engine PR feedback

* Revert "Add self-hosted Spice connector support"

This reverts commit 10af21f.

* fix merge

* better Error enum

* Support views on DDL catalogs (spiceai#10554)

* Support views on DDL catalogs

* fix ref

* fix variable

* fix compile

* fix: invert table_exists loop condition for view dependency wait

The view dependency polling loop had an inverted condition: it retried
while table_exists() returned true (table found) and broke when it
returned false (table not found). This caused all view tests to fail —
the loop would spin until the deadline with the table already present,
log 'does not exist, retrying...' throughout, then exit on timeout and
report the view as failed.

Fix: negate the condition so the loop retries while the table is absent
(!table_exists) and exits as soon as the table appears.

* fix: table_exists takes &TableReference to satisfy clippy::needless_pass_by_value

* bad merge

* Update datafusion (spiceai#10422)

* update datafusion

* fix: Update test snapshots

* fix: Update test snapshots

* fixes for _score in vector UDTF

* remove bas snapshots

* fix: Update Search integration test snapshots

* search: stabilize vector_search score column for rrf recency plans

* revert these snapshots

* fix: Update Search integration test snapshots

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* update snapshot

* remove

* fix build

* fix: remove unused col import

The col function from datafusion_expr is no longer used in
embeddings/udtf.rs after the upstream datafusion update; the lint
job runs with -D warnings, so the unused import broke the build.

* snapshot

* update DF

* Fix DF

* fixes

* improved testing for RRF

* snapshots

* normalize remove

* fix: Update Search integration test snapshots (spiceai#10567)

Co-authored-by: Spice Snapshot Update Bot <spiceaibot@spice.ai>

* test: update insta snapshots

---------

Co-authored-by: Spice Snapshot Update Bot <spiceaibot@spice.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Evgenii Khramkov <evgenii@spice.ai>
Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: claudespice <claude@spice.ai>

* Improve full-text search indexing performance (spiceai#10464)

* Improve tantivy FTS ingest performance

* Improve tests

* Rollback on error path

* remove index as unnecessary

---------

Co-authored-by: Jack Eadie <jack@spice.ai>

* add to search integration tests

* fix compile

* update docs

* Address DuckDB vector review feedback

* fix boxed

* Address DuckDB vector follow-up review feedback

* Bound DuckDB vector search default limit

* fix: Update Search integration test snapshots

* fix finding duckdb index

* feat: Enhance DuckDB vector query handling for empty projections and filter pushdown

* fix: Update Search integration test snapshots

* fixes

* fixes

* better docs

* chore: update datafusion-table-providers to add ignored_index_prefixes

Pin to fork branch spiceai-hnsw-index-drift which adds
`TableDefinition::add_ignored_index_prefix` so externally-managed HNSW
indexes (named `__spice_vss_*`) are excluded from the DuckDB writer's
index drift check, preventing spurious refresh failures.

* chore: update datafusion-table-providers to upstream merged commit

Switch from Jeadie fork back to datafusion-contrib upstream at
df7dbc64, which includes the merged ignored_index_prefixes fix.

* formatting

* clippy

* fix clippy: return closure result directly instead of let binding

---------

Co-authored-by: jeadie <jack@spice.ai>
Co-authored-by: Spice Snapshot Update Bot <spiceaibot@spice.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Evgenii Khramkov <evgenii@spice.ai>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: claudespice <claude@spice.ai>
Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: William <98815791+peasee@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants