You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enable INSERT..SELECT and COPY FROM pushdown for partitioned Iceberg tables
Partitioned writes previously went through the row-by-row PartitionedDestReceiver,
which routes each row to a per-partition CSV file before converting to Parquet.
This change enables DuckDB's PARTITION_BY in COPY TO for tables using pushdownable
transforms (identity, year, month, day, hour), letting DuckDB split the data in a
single pass. Bucket and truncate transforms continue to use the existing path.
Key changes:
- Add partition_pushdown.c with transform-to-DuckDB-SQL conversion, query wrapping
with synthetic partition columns, and Hive-style path parsing for partition values
- Extend WriteQueryResultTo with partitionByExprs parameter that wraps the query
with synthetic columns AFTER validation/interval wrappers and adds PARTITION_BY
to the COPY command
- Modify AddQueryResultToTable to detect pushdownable partitions, pass expressions
through, and parse per-file partition values from DuckDB output paths
- Lift blanket partition blocks in IsPushdownableInsertSelectQuery and
IsCopyFromPushdownable to allow pushdown when all transforms are supported
- Disable FILE_SIZE_BYTES when PARTITION_BY is used (DuckDB limitation)
Signed-off-by: Marco Slot <marco.slot@snowflake.com>
0 commit comments