Skip to content

[pull] trunk from spiceai:trunk#851

Merged
pull[bot] merged 1 commit into
TheRakeshPurohit:trunkfrom
spiceai:trunk
May 20, 2026
Merged

[pull] trunk from spiceai:trunk#851
pull[bot] merged 1 commit into
TheRakeshPurohit:trunkfrom
spiceai:trunk

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented May 20, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

…10953)

* udtf: fix UdtfExec invariant vec lengths to match children count

`UdtfExec::children()` reports one child (the inner plan), but
`maintains_input_order()`, `required_input_ordering()`,
`required_input_distribution()`, and `benefits_from_input_partitioning()`
all returned empty Vecs, violating the DataFusion ExecutionPlan contract
that these vectors must have one entry per child.

When `UdtfExec` got composed under another plan (e.g. `rrf(text_search(...),
vector_search(...))` without an explicit `join_key`), an optimizer pass
invoked `check_default_invariants` and the query failed with:

  Internal error: Assertion failed: actual_len == children_len
  (left: 0, right: 1): UdtfExec::maintains_input_order returned Vec with
  incorrect size: 0 != 1.

Since `UdtfExec::execute()` delegates straight to the inner plan, the
correct values are: `maintains_input_order = [true]`,
`benefits_from_input_partitioning = [false]`, `required_input_ordering = [None]`,
`required_input_distribution = [UnspecifiedDistribution]`.

Add a regression test that builds a UdtfExec and asserts both the per-method
Vec lengths and `check_invariants` itself.

Fixes #10951

* cayenne: invalidate scan_file_statistics cache after position-based delete

The position-based delete path updates the per-file deletion bitmap in
`cached_deleted_row_ids` but did not invalidate the per-file
`scan_file_statistics` cache that `CayenneTableProvider::list_files_for_snapshot_scan`
populates from `infer_stats`. Because `infer_stats` applies the
`VortexAccessPlanProvider::adjust_statistics` hook at the time it runs, the
cached entry froze the row count as of the *previous* delete. The next
`COUNT(*)` (or any other stats-driven query) hit the cache and returned the
stale count — even though the deletion bitmap itself was up to date.

Also keep the `cayenne::metastore::sqlite` docblock backtick fix (`SQLite`)
that the trunk lint failure pointed out — clippy::doc_markdown was failing
on that line under `clippy::pedantic`.

`PkKeysetInvalidatingDeletionSink` already wraps every `delete_using_deletion_vectors`
sink, so dropping the cache there covers every position-based deletion path
without touching the position-based sink internals. PK-strategy callers also
flow through this sink — clearing for them is harmless because they don't
populate the access-plan stats path.
@pull pull Bot locked and limited conversation to collaborators May 20, 2026
@pull pull Bot added the ⤵️ pull label May 20, 2026
@pull pull Bot merged commit b935783 into TheRakeshPurohit:trunk May 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant