Skip to content

fix(duckdb): cast query_arrow results to projected_schema#652

Merged
ewgenius merged 2 commits into
spiceai-52from
evgenii/0521/duckdb-cast-to-projected-schema
May 21, 2026
Merged

fix(duckdb): cast query_arrow results to projected_schema#652
ewgenius merged 2 commits into
spiceai-52from
evgenii/0521/duckdb-cast-to-projected-schema

Conversation

@ewgenius
Copy link
Copy Markdown
Collaborator

@ewgenius ewgenius commented May 21, 2026

DuckDB's query_arrow ignored the projected_schema parameter, returning batches with DuckDB's native types (e.g. Timestamp(µs)) even when the caller expected different types (e.g. Timestamp(ns)).

This caused schema mismatches for downstream operators (SortExec, RowConverter) that get pushed below SchemaCastScanExec in partitioned execution plans.

Changes

  • Cast result batches to projected_schema in the DuckDB query_arrow output stream when types differ
  • Add shared cast_batch_to_schema utility in util/arrow.rs for reuse by other Arrow-native connectors (ADBC, ODBC)
  • Reverted fix(duckdb): use actual DuckDB schema for read provider #650 as not needed anymore

DuckDB's query_arrow ignored the projected_schema parameter, returning
batches with DuckDB's native types (e.g. Timestamp(µs)) even when the
caller expected different types (e.g. Timestamp(ns)). This caused schema
mismatches for downstream operators pushed below SchemaCastScanExec.

Cast result batches to projected_schema in the output stream when types
differ. Add shared cast_batch_to_schema utility in util/arrow.rs for
reuse by other Arrow-native connectors (ADBC, ODBC).
@ewgenius ewgenius changed the base branch from main to spiceai-52 May 21, 2026 02:50
@ewgenius ewgenius self-assigned this May 21, 2026
@ewgenius ewgenius added the bug Something isn't working label May 21, 2026
@ewgenius ewgenius changed the title Evgenii/0521/duckdb cast to projected schema fix(duckdb): cast query_arrow results to projected_schema May 21, 2026
@ewgenius ewgenius marked this pull request as ready for review May 21, 2026 02:51
@ewgenius ewgenius requested a review from phillipleblanc May 21, 2026 02:53
@ewgenius ewgenius enabled auto-merge (squash) May 21, 2026 03:48
@ewgenius ewgenius merged commit 846d4de into spiceai-52 May 21, 2026
12 checks passed
@ewgenius ewgenius deleted the evgenii/0521/duckdb-cast-to-projected-schema branch May 21, 2026 04:03
ewgenius added a commit to spiceai/spiceai that referenced this pull request May 21, 2026
ewgenius added a commit to spiceai/spiceai that referenced this pull request May 21, 2026
phillipleblanc pushed a commit to spiceai/spiceai that referenced this pull request May 21, 2026
…ESTAMPTZ columns (#10947)

* test: Add failing tests for monotonic cast ordering propagation in SchemaCastScanExec

Add tests that verify SchemaCastScanExec should propagate ordering through
monotonic casts (temporal→temporal, numeric widening) and should return
maintains_input_order=false for non-monotonic casts. These tests currently
fail, demonstrating the RowConverter schema mismatch bug when using ORDER BY
on partitioned DuckDB-accelerated tables with TIMESTAMPTZ columns.

* fix: Propagate ordering through monotonic casts in SchemaCastScanExec

Add is_order_preserving_cast() helper that identifies monotonic type casts
(temporal→temporal, numeric→numeric) following DataFusion's CastExpr convention.

Update equivalence_properties to propagate input ordering when the sort-key
column cast is monotonic, and update maintains_input_order() to return false
only when a non-monotonic cast exists.

This fixes the 'RowConverter column schema mismatch' and 'does not satisfy
order requirements' errors when using ORDER BY on partitioned DuckDB-accelerated
tables with TIMESTAMPTZ columns (Timestamp µs→ns cast).

* fix formatting

* update datafusion-table-providers, to include datafusion schema fix

* fix: Tighten is_order_preserving_cast to whitelist safe numeric widenings

Address review comments:
- Restrict numeric casts to a known-safe monotonic whitelist instead of
  allowing all numeric→numeric (signed↔unsigned can reorder).
- Trim comments for clarity.
- Update stale comment above ordering propagation logic.
- Add comprehensive unit tests for is_order_preserving_cast and
  is_numeric_widening covering all positive and negative cases.
- Add test for sort-key-unchanged-but-other-column-cast edge case.

* test: Update retention test to expect Timestamp(µs) from DuckDB accelerator

DuckDB stores timestamps in microsecond precision. With the table-providers
fix, DuckSqlExec now correctly reports its actual µs schema instead of
claiming ns. Update the test assertions accordingly.

* Move is_order_preserving_cast and is_numeric_widening to arrow_tools::schema

* Fix clippy::unnested_or_patterns in is_numeric_widening

* remove unused import

* Skip DuckDB nullability assertion in test_schema_preservation

DuckDB does not preserve NOT NULL field metadata when returning Arrow
results (all scanned columns are reported as nullable).
See: duckdb/duckdb#4629

* Fix test

* fix lint

* Advertise engine schema from partitions in PartitionTableProvider

Add test to verify schema reflects engine type downgrades (e.g.
Timestamp ns→µs)

* linting

* Revert partitioned table schema override (handled by DTP schema cast at read time)

* Update datafusion-table-providers to
846d4de245e919bf3c3c1729c85f50a3564d7949

Include
datafusion-contrib/datafusion-table-providers#652

* cleanup

* cleanup

---------

Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com>
Co-authored-by: Jeadie <jeadie@users.noreply.github.com>
Co-authored-by: jeadie <jack@spice.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants