Skip to content

fix(duckdb): use actual DuckDB schema for read provider#650

Merged
ewgenius merged 3 commits into
spiceai-52from
evgenii/0520/duckdb-read-provider-use-actual-schema
May 20, 2026
Merged

fix(duckdb): use actual DuckDB schema for read provider#650
ewgenius merged 3 commits into
spiceai-52from
evgenii/0520/duckdb-read-provider-use-actual-schema

Conversation

@ewgenius
Copy link
Copy Markdown
Collaborator

Problem

DuckDBTableProviderFactory::create() uses cmd.schema (user-facing types) as the read provider schema. DuckDB silently downgrades certain types during table creation (e.g. Timestamp(ns, tz)Timestamp(µs, tz) for TIMESTAMPTZ). The read provider then advertises the wrong schema while returning batches with actual DuckDB types, causing:

RowConverter column schema mismatch, expected Timestamp(Nanosecond, Some("UTC")) got Timestamp(Microsecond, Some("UTC"))

This crashes ORDER BY queries on partitioned DuckDB-accelerated datasets.

Fix

After table creation, query DuckDB for the actual storage schema via get_schema() and use that for the read provider. This matches the existing pattern in DuckDBTableFactory::table_provider().

Test

Added test_read_provider_schema_reflects_actual_duckdb_types which verifies the read provider advertises Timestamp(Microsecond, ...) when DuckDB downgrades from the requested Timestamp(Nanosecond, ...).

@ewgenius ewgenius self-assigned this May 20, 2026
@ewgenius ewgenius marked this pull request as draft May 20, 2026 00:23
@ewgenius ewgenius added the bug Something isn't working label May 20, 2026
@ewgenius ewgenius requested a review from phillipleblanc May 20, 2026 00:23
…d.schema

DuckDB may silently change types during table creation (e.g. Timestamp(ns, tz)
becomes Timestamp(µs, tz) for TIMESTAMPTZ). The read provider must advertise the
actual storage types so downstream operators (SortExec, RowConverter) receive
batches matching the advertised schema.

Query DuckDB via get_schema() after table creation to obtain the true storage
schema, consistent with DuckDBTableFactory::table_provider() which already does
this correctly.

Fixes RowConverter column schema mismatch on ORDER BY with partitioned DuckDB
acceleration.
@ewgenius ewgenius force-pushed the evgenii/0520/duckdb-read-provider-use-actual-schema branch from 2bb1bce to 4ad04d8 Compare May 20, 2026 00:30
@ewgenius ewgenius marked this pull request as ready for review May 20, 2026 00:30
Comment thread core/src/duckdb.rs
@ewgenius ewgenius force-pushed the evgenii/0520/duckdb-read-provider-use-actual-schema branch 2 times, most recently from 640ef30 to 4ad04d8 Compare May 20, 2026 01:12
Address review feedback: read the actual DuckDB schema into the schema
variable used by both TableDefinition (write path) and the read provider,
rather than only fixing the read provider.
Comment thread core/src/duckdb.rs
@ewgenius ewgenius merged commit 040aa83 into spiceai-52 May 20, 2026
12 checks passed
@ewgenius ewgenius deleted the evgenii/0520/duckdb-read-provider-use-actual-schema branch May 20, 2026 02:31
ewgenius added a commit that referenced this pull request May 21, 2026
ewgenius added a commit that referenced this pull request May 21, 2026
* fix(duckdb): cast query_arrow results to projected_schema

DuckDB's query_arrow ignored the projected_schema parameter, returning
batches with DuckDB's native types (e.g. Timestamp(µs)) even when the
caller expected different types (e.g. Timestamp(ns)). This caused schema
mismatches for downstream operators pushed below SchemaCastScanExec.

Cast result batches to projected_schema in the output stream when types
differ. Add shared cast_batch_to_schema utility in util/arrow.rs for
reuse by other Arrow-native connectors (ADBC, ODBC).

* Revert "fix(duckdb): use actual DuckDB schema for read provider (#650)"

This reverts commit 040aa83.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants