Skip to content

Support copy from and insert..select pushdown with identity and time partitioning#322

Open
sfc-gh-mslot wants to merge 1 commit into
mainfrom
marcoslot/partitioning-copy-pushdown
Open

Support copy from and insert..select pushdown with identity and time partitioning#322
sfc-gh-mslot wants to merge 1 commit into
mainfrom
marcoslot/partitioning-copy-pushdown

Conversation

@sfc-gh-mslot
Copy link
Copy Markdown
Collaborator

@sfc-gh-mslot sfc-gh-mslot commented Apr 20, 2026

This PR adds support for pushing down INSERT..SELECT and COPY..FROM when the target table is partitioned using identity or time functions. It uses the PARTITION_BY clause in the COPY TO command in DuckDB to generate paths, using a synthetic column that contains the partition value.

Example:

create table test (x int, y timestamptz default now()) using iceberg WITH (partition_by 'year(y)');
insert into test (x) select s from generate_series(1,100) s;

-- underneath
COPY (SELECT *, datediff('day', date '1970-01-01', y::date) AS __part_0 FROM (SELECT x, CASE WHEN y NOT BETWEEN TIMESTAMPTZ '0001-01-01 00:00:00+00' AND TIMESTAMPTZ '9999-12-31 23:59:59.999999+00' THEN CAST(error(printf('timestamptz out of range: %s', y::VARCHAR)) AS TIMESTAMPTZ) ELSE y END AS y FROM ( SELECT "*SELECT*".s AS x,
    '2026-04-20 20:52:31.007613+00'::timestamptz  AS y
   FROM ( SELECT s.s
           FROM generate_series_int(1, 100) s(s)) "*SELECT*"(s)) AS __iceberg_oor) __partitioned_source) TO 's3://marco-iceberg/iceberg/postgres/public/test/88731/data/7e/7e2038cf-a0cf-4047-b308-112c39622f47' WITH (format 'parquet', compression 'snappy', field_ids {'x': 1, 'y': 2}, row_group_size_bytes '512MB', parquet_version 'V1', return_stats, PARTITION_BY (__part_0))

writes files like:

s3://marco-iceberg/iceberg/postgres/public/test/88731/data/7e/7e2038cf-a0cf-4047-b308-112c39622f47/__part_0=20563/data_0.parquet

where the partition value is obtained by parsing the value after __part_0 (the synthetic column name)

@sfc-gh-mslot sfc-gh-mslot force-pushed the marcoslot/partitioning-copy-pushdown branch 4 times, most recently from 8ff6221 to 986731f Compare April 21, 2026 09:21
Copy link
Copy Markdown
Collaborator

@sfc-gh-okalaci sfc-gh-okalaci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see that you are already pushing changes, and some of them already fixed. Still sharing as fyi

Comment thread pg_lake_table/src/fdw/partition_pushdown.c Outdated
Comment thread pg_lake_table/src/fdw/partition_pushdown.c
Comment thread pg_lake_table/src/fdw/partition_pushdown.c Outdated
Comment thread pg_lake_table/src/fdw/partition_pushdown.c Outdated
Comment thread pg_lake_copy/src/copy/copy.c Outdated
Comment thread pg_lake_table/src/fdw/partition_pushdown.c Outdated
Comment thread pg_lake_table/src/fdw/partition_pushdown.c Outdated
@sfc-gh-mslot sfc-gh-mslot force-pushed the marcoslot/partitioning-copy-pushdown branch 2 times, most recently from ab93e6b to f0835c2 Compare April 21, 2026 10:29
Comment thread pg_lake_table/src/fdw/writable_table.c
Comment thread pg_lake_table/src/fdw/partition_pushdown.c
@sfc-gh-mslot sfc-gh-mslot force-pushed the marcoslot/partitioning-copy-pushdown branch 3 times, most recently from 87789c2 to debe2e9 Compare April 21, 2026 13:21
Comment thread pg_lake_engine/pg_lake_engine--3.3--3.4.sql Outdated
Comment thread pg_lake_table/src/fdw/partition_pushdown.c
Comment thread pg_lake_engine/src/pgduck/write_data.c Outdated
Comment thread pg_lake_table/src/fdw/partition_pushdown.c
Comment thread pg_lake_table/src/fdw/partition_pushdown.c
ICEBERG_OOR_NONE,
false /* wrapNativeIntervals */ );
false /* wrapNativeIntervals */ ,
NIL /* partitionByExprs */ );
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, VACUUM will spilt data files which are impacted by partitioned writes skipping target_file_size_mb ? Say, a partitioned COPY wrote 100GB file, we'll spilt it into 200 files each 512MB.

That's probably the right action, we want VACUUM to fix it, but I wanted to raise it here:

  • To make sure you agree and happy about it
  • Maybe add a comment for future readers

Copy link
Copy Markdown
Collaborator Author

@sfc-gh-mslot sfc-gh-mslot Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmz, that's a good point. This will affect large data dumps.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was checking if we can implement partition_by + file_size_bytes on DuckDB, and it seems they landed a PR last week, which doesn't solve it but references it as Future Work: duckdb/duckdb#22225 (comment)

Well, that's both good and bad. We cannot implement it for the current DuckDB version (or very hard), but we can (a) wait for the DuckDB maintainers to implement (b) or consider working on that as a patch here, and then send it over to DuckDB upstream.

Comment thread pg_lake_table/tests/pytests/test_partitioned_pushdown.py
@sfc-gh-okalaci
Copy link
Copy Markdown
Collaborator

Maybe one final note: We don't seem to have any tests for partition evolution. I don't see any risks, but would be good to say start with no partition, then year(ts) then switch to month(ts) then drop it again, and do pushdowns in between.

@sfc-gh-mslot sfc-gh-mslot force-pushed the marcoslot/partitioning-copy-pushdown branch 4 times, most recently from a74d802 to 1e114fd Compare April 23, 2026 17:08
…tables

Partitioned writes previously went through the row-by-row PartitionedDestReceiver,
which routes each row to a per-partition CSV file before converting to Parquet.
This change enables DuckDB's PARTITION_BY in COPY TO for tables using pushdownable
transforms (identity, year, month, day, hour), letting DuckDB split the data in a
single pass. Bucket and truncate transforms continue to use the existing path.

Key changes:
- Add partition_pushdown.c with transform-to-DuckDB-SQL conversion, query wrapping
  with synthetic partition columns, and Hive-style path parsing for partition values
- Extend WriteQueryResultTo with partitionByExprs parameter that wraps the query
  with synthetic columns AFTER validation/interval wrappers and adds PARTITION_BY
  to the COPY command
- Modify AddQueryResultToTable to detect pushdownable partitions, pass expressions
  through, and parse per-file partition values from DuckDB output paths
- Lift blanket partition blocks in IsPushdownableInsertSelectQuery and
  IsCopyFromPushdownable to allow pushdown when all transforms are supported
- Disable FILE_SIZE_BYTES when PARTITION_BY is used (DuckDB limitation)

Signed-off-by: Marco Slot <marco.slot@snowflake.com>
@sfc-gh-mslot sfc-gh-mslot force-pushed the marcoslot/partitioning-copy-pushdown branch from 64b419d to 3ab641e Compare April 23, 2026 19:21
@sfc-gh-dachristensen
Copy link
Copy Markdown
Collaborator

Are there any concerns about the data values as path keys (null bytes, slashes etc)? (I haven't read in detail this PR or previous feedback, so I assume there may already be handled via some sort of escaping method, but it was the first thing I thought of when reviewing the PR description.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants