Skip to content

[META] Carry-forward of fork integration work — tracking issue #26

@zfarrell

Description

@zfarrell

Tracking: Carry-forward of fork integration work

Background

A few months ago an agent team did substantial work on this fork to make it "1.0" feature-compliant with the DuckLake spec. The work landed on ducklake-features/integration (265 commits, +23k net src LOC, +14 new src files) but was never validated, reviewed, or upstreamed.

In 2026-05 we audited it across five dimensions: build correctness, test quality, code quality, feature inventory, and upstream overlap. Verdict: real, substantive engineering — not AI slop — but cannot be merged as-is. It branched from upstream commit 59eb3da (PR datafusion-contrib#79) and upstream has since landed datafusion 53 / arrow 58, TableProvider::statistics(), and a competing virtual-column design.

This issue tracks the decomposition of that work into focused upstreamable PRs.

Workstreams

Each child ticket is self-contained for an agent with no prior context. Read #12 first; every other ticket references back to it.

# Ticket Status Depends on
#12 Foundation: rebase + cleanup + SLT triage open
#13 Metadata writer dialect/macro layer open #12
#14 Postgres MetadataWriter open #12 #13
#15 MySQL MetadataWriter open #12 #13
#16 DuckLake QueryPlanner open #12
#17 DELETE physical execution open #12 #16
#18 UPDATE physical execution open #12 #16 #17
#19 MERGE physical execution open #12 #16 #17 #18
#20 ALTER/DROP/CREATE schema evolution open #12 #13
#21 CDC table functions (+ cdc_common bug fix) open #12
#22 Virtual columns reconciliation (DESIGN first) open #12
#23 Type system & inlined-data parsing open #12
#24 R10/R11 hardening safety net open #12
#25 Partitioned INSERT + write-path expansion open #12

Explicitly dropped from carry-forward

Side pickups worth doing

Suggested execution order

The dependency graph above suggests this rough order of attack:

  1. Foundation: rebase integration onto upstream, drop pass-throughs, triage SLT failures #12 (foundation) — must complete before anything else
  2. Metadata writer dialect/macro layer (prereq for PG + MySQL writers) #13 (dialect/macro layer) — prerequisite for Postgres MetadataWriter implementation #14, MySQL MetadataWriter implementation #15, ALTER/DROP/CREATE schema evolution DDL #20
  3. DuckLake QueryPlanner: intercept DML logical plans #16 (planner) — prerequisite for DELETE physical execution (MOR delete files) #17, UPDATE physical execution (MOR delete + insert) #18, MERGE physical execution (INSERT/UPDATE/DELETE atomic) #19
  4. Parallel: Postgres MetadataWriter implementation #14, MySQL MetadataWriter implementation #15, ALTER/DROP/CREATE schema evolution DDL #20 (metadata work), DELETE physical execution (MOR delete files) #17 (DELETE), CDC table functions (with cdc_common duplicate-column bug fix) #21 (CDC), Type system & inlined-data parsing (merge with upstream overlap) #23 (types), R10/R11 hardening fixes batch (correctness + atomicity safety net) #24 (hardening), Partitioned INSERT + write-path expansion (stats, footer-size, partition layout) #25 (INSERT expansion)
  5. UPDATE physical execution (MOR delete + insert) #18 (UPDATE) after DELETE physical execution (MOR delete files) #17
  6. MERGE physical execution (INSERT/UPDATE/DELETE atomic) #19 (MERGE) after DELETE physical execution (MOR delete files) #17, UPDATE physical execution (MOR delete + insert) #18
  7. Reconcile virtual_column_exec with upstream row_id (DESIGN required first) #22 (virtual columns) — design first, then implement; can run in parallel but ship last because of the upstream reconciliation requirement

Audit summary

Full audit raw findings: available on request, summarized in the foundation ticket.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions