You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A few months ago an agent team did substantial work on this fork to make it "1.0" feature-compliant with the DuckLake spec. The work landed on ducklake-features/integration (265 commits, +23k net src LOC, +14 new src files) but was never validated, reviewed, or upstreamed.
In 2026-05 we audited it across five dimensions: build correctness, test quality, code quality, feature inventory, and upstream overlap. Verdict: real, substantive engineering — not AI slop — but cannot be merged as-is. It branched from upstream commit 59eb3da (PR datafusion-contrib#79) and upstream has since landed datafusion 53 / arrow 58, TableProvider::statistics(), and a competing virtual-column design.
This issue tracks the decomposition of that work into focused upstreamable PRs.
Workstreams
Each child ticket is self-contained for an agent with no prior context. Read #12 first; every other ticket references back to it.
src/compaction_functions.rs — DuckDB pass-through (shells out to INSTALL ducklake; LOAD ducklake; ATTACH ...; CALL ducklake_merge_adjacent_files();). Per project direction, no feature should depend on libduckdb at runtime for its actual logic. Compaction will be reintroduced later as a native Rust implementation.
The 56-file docs/ directory on the integration branch — agent-process artifacts (R1–R11 review cycles, retrospectives). Move to a separate archive branch or .audit/ after rebase; do not include in upstream PRs.
Side pickups worth doing
docs/gap-analysis.md from PR docs: add phase 1/2 analysis artifacts #3 (859 lines) — only doc unique to that branch. Decide if it's still relevant post-rebase and either port to .audit/ or drop.
Tracking: Carry-forward of fork integration work
Background
A few months ago an agent team did substantial work on this fork to make it "1.0" feature-compliant with the DuckLake spec. The work landed on
ducklake-features/integration(265 commits, +23k net src LOC, +14 new src files) but was never validated, reviewed, or upstreamed.In 2026-05 we audited it across five dimensions: build correctness, test quality, code quality, feature inventory, and upstream overlap. Verdict: real, substantive engineering — not AI slop — but cannot be merged as-is. It branched from upstream commit
59eb3da(PR datafusion-contrib#79) and upstream has since landed datafusion 53 / arrow 58,TableProvider::statistics(), and a competing virtual-column design.This issue tracks the decomposition of that work into focused upstreamable PRs.
Workstreams
Each child ticket is self-contained for an agent with no prior context. Read #12 first; every other ticket references back to it.
Explicitly dropped from carry-forward
src/compaction_functions.rs— DuckDB pass-through (shells out toINSTALL ducklake; LOAD ducklake; ATTACH ...; CALL ducklake_merge_adjacent_files();). Per project direction, no feature should depend on libduckdb at runtime for its actual logic. Compaction will be reintroduced later as a native Rust implementation.ducklake-features/integration(others).docs/directory on the integration branch — agent-process artifacts (R1–R11 review cycles, retrospectives). Move to a separate archive branch or.audit/after rebase; do not include in upstream PRs.Side pickups worth doing
docs/gap-analysis.mdfrom PR docs: add phase 1/2 analysis artifacts #3 (859 lines) — only doc unique to that branch. Decide if it's still relevant post-rebase and either port to.audit/or drop.assert!(record_count >= 0, ...)guard inDataFileInfo::newfrom PR fix: preserve record_count validation + test stability updates #9 — 15-line cherry-pick. Trivial; fold into Foundation: rebase integration onto upstream, drop pass-throughs, triage SLT failures #12 if convenient.Suggested execution order
The dependency graph above suggests this rough order of attack:
Audit summary
cdc_common.rs:107-114duplicate-column projection — fix tracked in CDC table functions (with cdc_common duplicate-column bug fix) #21)#[ignore]outside external-service tests, zero TODO/FIXME, exact-value assertions throughoutFull audit raw findings: available on request, summarized in the foundation ticket.