You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is one ticket in a series carrying forward #12 foundation work. Read #12 first for repo context.
Upstream has a basic src/insert_exec.rs and src/table_writer.rs. The fork expanded these substantially: insert_exec.rs +965 lines, table_writer.rs +1875 lines. The expansion covers partitioned INSERT (Hive layout: partition_col=value/...), column-stats collection during write, multi-file INSERTs, footer-size capture for read-side optimization, encryption-feature gating, and the UploadCleanupGuard orphan-cleanup pattern.
Port the partitioned INSERT path. Partition columns identified from the table's catalog metadata; each output batch routed to the right partition_col=value/ directory; partition values URL-encoded as DuckLake spec requires.
Port footer-size capture and persistence. The fork stores Parquet footer size in metadata for read-side with_metadata_size_hint() optimization; this is already exploited on the read path. Ensure the write path persists it.
Encryption gating: writes honor the encryption feature flag the same way reads do. Do not break the build when the flag is off.
Acceptance criteria
tests/write_partition_tests.rs passes — verify region=US/, region=EU/ Hive directories actually contain the right Parquet files with the right rows
tests/stats_tests.rs passes — TableProvider::statistics() returns correct min/max/null_count after multi-append (the test uses Precision::Inexact(ScalarValue::Int32(10)) style assertions)
Partition values containing special characters (/, =, spaces, unicode) are correctly URL-encoded
Concurrent INSERT into the same partition from two transactions: both commit successfully (additive)
Commit failure mid-INSERT: no orphan files left on disk (verify filesystem inspection in test)
No duckdb crate imports
Footer-size is captured during write and read back during scan, eliminating the second-read on Parquet open
Context
This is one ticket in a series carrying forward #12 foundation work. Read #12 first for repo context.
Upstream has a basic
src/insert_exec.rsandsrc/table_writer.rs. The fork expanded these substantially:insert_exec.rs+965 lines,table_writer.rs+1875 lines. The expansion covers partitioned INSERT (Hive layout:partition_col=value/...), column-stats collection during write, multi-file INSERTs, footer-size capture for read-side optimization, encryption-feature gating, and theUploadCleanupGuardorphan-cleanup pattern.Reference branch
ducklake-features/integration:src/insert_exec.rs— physical exec, expandedsrc/table_writer.rs— shared write helpers, substantially expandedtests/write_partition_tests.rs(400 lines, partitioned INSERT round-trips with filesystem inspection ofregion=US/region=EU/layout),tests/write_tests.rs(+324 lines),tests/stats_tests.rs(447 lines, column-stats round-trips)Scope
partition_col=value/directory; partition values URL-encoded as DuckLake spec requires.Statistics(min/max/null_count, distinct_count where cheap) and register them inducklake_column_stats(or wherever the spec puts them — verify against current upstream schema after Foundation: rebase integration onto upstream, drop pass-throughs, triage SLT failures #12 rebase).with_metadata_size_hint()optimization; this is already exploited on the read path. Ensure the write path persists it.UploadCleanupGuardpattern — uploaded files are cleaned up on commit failure (already partially mentioned in DELETE physical execution (MOR delete files) #17/UPDATE physical execution (MOR delete + insert) #18 reference, but the source-of-truth implementation lives here).encryptionfeature flag the same way reads do. Do not break the build when the flag is off.Acceptance criteria
tests/write_partition_tests.rspasses — verifyregion=US/,region=EU/Hive directories actually contain the right Parquet files with the right rowstests/stats_tests.rspasses —TableProvider::statistics()returns correct min/max/null_count after multi-append (the test usesPrecision::Inexact(ScalarValue::Int32(10))style assertions)/,=, spaces, unicode) are correctly URL-encodedduckdbcrate importsDependencies
table_writer.rs; coordinate to avoid duplicate portsOut of scope
Notes
parity_basic_crud_after_insertfails on a decimal-formatter mismatch with DuckDB; that's tracked under Foundation: rebase integration onto upstream, drop pass-throughs, triage SLT failures #12, not here.