fix(duckdb): cast query_arrow results to projected_schema#652
Merged
ewgenius merged 2 commits intoMay 21, 2026
Conversation
DuckDB's query_arrow ignored the projected_schema parameter, returning batches with DuckDB's native types (e.g. Timestamp(µs)) even when the caller expected different types (e.g. Timestamp(ns)). This caused schema mismatches for downstream operators pushed below SchemaCastScanExec. Cast result batches to projected_schema in the output stream when types differ. Add shared cast_batch_to_schema utility in util/arrow.rs for reuse by other Arrow-native connectors (ADBC, ODBC).
This reverts commit 040aa83.
phillipleblanc
approved these changes
May 21, 2026
ewgenius
added a commit
to spiceai/spiceai
that referenced
this pull request
May 21, 2026
846d4de245e919bf3c3c1729c85f50a3564d7949 Include datafusion-contrib/datafusion-table-providers#652
ewgenius
added a commit
to spiceai/spiceai
that referenced
this pull request
May 21, 2026
846d4de245e919bf3c3c1729c85f50a3564d7949 Include datafusion-contrib/datafusion-table-providers#652
phillipleblanc
pushed a commit
to spiceai/spiceai
that referenced
this pull request
May 21, 2026
…ESTAMPTZ columns (#10947) * test: Add failing tests for monotonic cast ordering propagation in SchemaCastScanExec Add tests that verify SchemaCastScanExec should propagate ordering through monotonic casts (temporal→temporal, numeric widening) and should return maintains_input_order=false for non-monotonic casts. These tests currently fail, demonstrating the RowConverter schema mismatch bug when using ORDER BY on partitioned DuckDB-accelerated tables with TIMESTAMPTZ columns. * fix: Propagate ordering through monotonic casts in SchemaCastScanExec Add is_order_preserving_cast() helper that identifies monotonic type casts (temporal→temporal, numeric→numeric) following DataFusion's CastExpr convention. Update equivalence_properties to propagate input ordering when the sort-key column cast is monotonic, and update maintains_input_order() to return false only when a non-monotonic cast exists. This fixes the 'RowConverter column schema mismatch' and 'does not satisfy order requirements' errors when using ORDER BY on partitioned DuckDB-accelerated tables with TIMESTAMPTZ columns (Timestamp µs→ns cast). * fix formatting * update datafusion-table-providers, to include datafusion schema fix * fix: Tighten is_order_preserving_cast to whitelist safe numeric widenings Address review comments: - Restrict numeric casts to a known-safe monotonic whitelist instead of allowing all numeric→numeric (signed↔unsigned can reorder). - Trim comments for clarity. - Update stale comment above ordering propagation logic. - Add comprehensive unit tests for is_order_preserving_cast and is_numeric_widening covering all positive and negative cases. - Add test for sort-key-unchanged-but-other-column-cast edge case. * test: Update retention test to expect Timestamp(µs) from DuckDB accelerator DuckDB stores timestamps in microsecond precision. With the table-providers fix, DuckSqlExec now correctly reports its actual µs schema instead of claiming ns. Update the test assertions accordingly. * Move is_order_preserving_cast and is_numeric_widening to arrow_tools::schema * Fix clippy::unnested_or_patterns in is_numeric_widening * remove unused import * Skip DuckDB nullability assertion in test_schema_preservation DuckDB does not preserve NOT NULL field metadata when returning Arrow results (all scanned columns are reported as nullable). See: duckdb/duckdb#4629 * Fix test * fix lint * Advertise engine schema from partitions in PartitionTableProvider Add test to verify schema reflects engine type downgrades (e.g. Timestamp ns→µs) * linting * Revert partitioned table schema override (handled by DTP schema cast at read time) * Update datafusion-table-providers to 846d4de245e919bf3c3c1729c85f50a3564d7949 Include datafusion-contrib/datafusion-table-providers#652 * cleanup * cleanup --------- Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com> Co-authored-by: Jeadie <jeadie@users.noreply.github.com> Co-authored-by: jeadie <jack@spice.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
DuckDB's
query_arrowignored theprojected_schemaparameter, returning batches with DuckDB's native types (e.g.Timestamp(µs)) even when the caller expected different types (e.g.Timestamp(ns)).This caused schema mismatches for downstream operators (SortExec, RowConverter) that get pushed below
SchemaCastScanExecin partitioned execution plans.Changes
projected_schemain the DuckDBquery_arrowoutput stream when types differcast_batch_to_schemautility inutil/arrow.rsfor reuse by other Arrow-native connectors (ADBC, ODBC)