Context
This is one ticket in a series carrying forward #12 foundation work. Read #12 first for repo context.
DuckLake's CDC surface is exposed as table functions: ducklake_table_changes(table, from_snap, to_snap), ducklake_table_insertions, ducklake_table_deletions, ducklake_current_snapshot, ducklake_last_committed_snapshot. Upstream has placeholder implementations of table_changes.rs and table_deletions.rs; the fork has substantially expanded these (+340 lines and +364 lines respectively) and added cdc_common.rs plus table_insertions.rs plus expanded table_functions.rs (+873).
Reference branch
ducklake-features/integration:
src/cdc_common.rs — shared CDC projection logic (⚠️ has a known bug — see below)
src/table_insertions.rs — new TableProvider for insertion CDC
src/table_changes.rs — expanded
src/table_deletions.rs — expanded
src/table_functions.rs — registration and dispatch
- Tests:
tests/time_travel_tests.rs (14 tests), tests/table_function_tests.rs (10 tests)
Scope
- Port
cdc_common.rs, table_insertions.rs, the expanded table_changes.rs / table_deletions.rs, and the new entries in table_functions.rs.
- Fix the audit-identified bug in
cdc_common.rs (lines ~107–114). Duplicate-column projections (SELECT col, col) collapse to a single source position because table_indices.iter().position(|&ti| ti == idx) returns the first match. The accompanying test test_cdc_projection_duplicate_columns asserts the buggy behavior (assert_eq!(reorder[1], 0)) — fix the test too. The correct behavior: each output slot maps to its own batch column index.
- Verify the snapshot-bounded predicates from R11-S-001/004/005/006/018/022/025 (referenced in commit messages) are in place — these were a hardening round on snapshot-awareness.
ducklake_current_snapshot and ducklake_last_committed_snapshot are intentionally simple — port them verbatim.
- Ensure none of the table functions reach for
duckdb::Connection. All CDC work should be native — operate on the MetadataProvider and Parquet I/O directly.
Acceptance criteria
Dependencies
Out of scope
- Streaming CDC (
SUBSCRIBE semantics) — DuckLake spec does not require it
- Schema-evolution-aware CDC where the column set differs across the requested range — error out clearly; full support is a separate effort
Notes
- Audit verdict on
cdc_common.rs: "concerning — has a real bug in duplicate-column projection handling." This is the only correctness defect found across the entire fork audit; fixing it is the primary correctness goal of this ticket.
- Audit verdict on
table_insertions.rs: "solid — straightforward TableProvider mirroring upstream table.rs patterns."
Context
This is one ticket in a series carrying forward #12 foundation work. Read #12 first for repo context.
DuckLake's CDC surface is exposed as table functions:
ducklake_table_changes(table, from_snap, to_snap),ducklake_table_insertions,ducklake_table_deletions,ducklake_current_snapshot,ducklake_last_committed_snapshot. Upstream has placeholder implementations oftable_changes.rsandtable_deletions.rs; the fork has substantially expanded these (+340 lines and +364 lines respectively) and addedcdc_common.rsplustable_insertions.rsplus expandedtable_functions.rs(+873).Reference branch
ducklake-features/integration:src/cdc_common.rs— shared CDC projection logic (src/table_insertions.rs— newTableProviderfor insertion CDCsrc/table_changes.rs— expandedsrc/table_deletions.rs— expandedsrc/table_functions.rs— registration and dispatchtests/time_travel_tests.rs(14 tests),tests/table_function_tests.rs(10 tests)Scope
cdc_common.rs,table_insertions.rs, the expandedtable_changes.rs/table_deletions.rs, and the new entries intable_functions.rs.cdc_common.rs(lines ~107–114). Duplicate-column projections (SELECT col, col) collapse to a single source position becausetable_indices.iter().position(|&ti| ti == idx)returns the first match. The accompanying testtest_cdc_projection_duplicate_columnsasserts the buggy behavior (assert_eq!(reorder[1], 0)) — fix the test too. The correct behavior: each output slot maps to its own batch column index.ducklake_current_snapshotandducklake_last_committed_snapshotare intentionally simple — port them verbatim.duckdb::Connection. All CDC work should be native — operate on theMetadataProviderand Parquet I/O directly.Acceptance criteria
tests/time_travel_tests.rsall 14 tests passtests/table_function_tests.rsall 10 tests passSELECT col, col FROM ducklake_table_changes(...)that confirms each output column is independent (the bug fix)test_cdc_projection_duplicate_columnsupdated to assert the correct mapping (each output slot has its own source index)duckdbcrate imports in any of the affected filesDependencies
Out of scope
SUBSCRIBEsemantics) — DuckLake spec does not require itNotes
cdc_common.rs: "concerning — has a real bug in duplicate-column projection handling." This is the only correctness defect found across the entire fork audit; fixing it is the primary correctness goal of this ticket.table_insertions.rs: "solid — straightforward TableProvider mirroring upstream table.rs patterns."