You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The DuckLake format reached v1.0 (production-ready, backward-compat guaranteed) on 2026-04-13 alongside DuckDB v1.5.2. The v1.0 catalog schema introduces new metadata tables and features beyond what datafusion-ducklake currently understands. This epic tracks closing that gap so a v1.0 catalog written by the ducklake DuckDB extension is fully readable and writable through DataFusion.
Already covered
INSERT/DELETE writes, S3 write support, SQLite/MySQL/Postgres metadata backends, list/array types, PME-encrypted Parquet reads, byte-size statistics, snapshot isolation, basic schema evolution (type promotion, column rename). Data inlining + async metadata provider is on branch in #106 — land that rather than rebuild.
Outstanding work
Sub-tasks to split out:
Catalog schema version validation. Read the v1.0 schema version from ducklake_metadata; reject unknown versions with a clear error. Today the provider doesn't check at all.
Partitioning. Honor partition_column / partition_info / file_partition_value for partition pruning. Support v1.0's bucket(N, column) transform (murmur3, Iceberg-compatible) alongside identity/year/month/day/hour. Flagged as TODO in CLAUDE.md.
Sorted tables. Read sort_info / sort_expression so DataFusion can exploit ordering for filter and limit pushdown; preserve sort on writes.
Deletion vectors. Read v1.0's Iceberg-v3-compatible deletion vectors (roaring bitmaps in Puffin files) in addition to the existing positional-delete-file path in delete_filter.rs. (Note: marked experimental in the v1.0 announcement.)
Struct and map types. Currently error in types.rs. Precondition for nested GEOMETRY and VARIANT.
Geometry type. Replace today's Binary mapping with proper GEOMETRY. Read bounding-box stats from file_column_stats for spatial filter pushdown. Support nesting inside structs/lists/maps.
Variant type. Map VARIANT to Arrow; read file_variant_stats for shredded-sub-field file skipping.
Column mapping / name mapping. Use column_mapping and name_mapping to resolve renamed/dropped/re-added columns across snapshots; verify field-id-based Parquet reads.
Time travel. Expose user-facing snapshot selection (read at snapshot ID / timestamp). Snapshot pinning exists internally but isn't surfaced.
Views and macros. Surface view, macro, macro_impl, macro_parameters in DataFusion's catalog (at minimum, list and read views).
Tags. Read snapshot/table/column tags from tag / column_tag and expose via metadata APIs.
File lifecycle. Respect files_scheduled_for_deletion so the read path skips soft-deleted files; honor on the write path during compaction.
Add existing Parquet without copy. v1.0 supports registering pre-existing Parquet into the catalog without rewriting; expose on the write path.
UPDATE and ALTER TABLE. Today's writes cover INSERT and DELETE; UPDATE and schema mutations (add/drop/rename column, etc.) are missing.
Definition of done
Round-trip parity tests against catalogs produced by the ducklake DuckDB extension at v1.0 (DuckDB ≥ 1.5.2), exercising each feature above.
Reverse round-trip: catalogs written by datafusion-ducklake are readable by the DuckDB extension.
README and CLAUDE.md updated to reflect v1.0 parity (drop the "Current Limitations" list; refresh the "DuckDB only" note).
Catalog schema version negotiation: unknown versions are explicitly rejected with a clear error.
Summary
The DuckLake format reached v1.0 (production-ready, backward-compat guaranteed) on 2026-04-13 alongside DuckDB v1.5.2. The v1.0 catalog schema introduces new metadata tables and features beyond what
datafusion-ducklakecurrently understands. This epic tracks closing that gap so a v1.0 catalog written by theducklakeDuckDB extension is fully readable and writable through DataFusion.Already covered
INSERT/DELETE writes, S3 write support, SQLite/MySQL/Postgres metadata backends, list/array types, PME-encrypted Parquet reads, byte-size statistics, snapshot isolation, basic schema evolution (type promotion, column rename). Data inlining + async metadata provider is on branch in #106 — land that rather than rebuild.
Outstanding work
Sub-tasks to split out:
ducklake_metadata; reject unknown versions with a clear error. Today the provider doesn't check at all.partition_column/partition_info/file_partition_valuefor partition pruning. Support v1.0'sbucket(N, column)transform (murmur3, Iceberg-compatible) alongside identity/year/month/day/hour. Flagged as TODO in CLAUDE.md.sort_info/sort_expressionso DataFusion can exploit ordering for filter and limit pushdown; preserve sort on writes.delete_filter.rs. (Note: marked experimental in the v1.0 announcement.)file_column_stats/table_column_stats/table_statsinto DataFusion'sStatistics. Builds on feat(table): implement TableProvider::statistics() with byte-size aggregate #112.types.rs. Precondition for nestedGEOMETRYandVARIANT.GEOMETRY. Read bounding-box stats fromfile_column_statsfor spatial filter pushdown. Support nesting inside structs/lists/maps.VARIANTto Arrow; readfile_variant_statsfor shredded-sub-field file skipping.column_mappingandname_mappingto resolve renamed/dropped/re-added columns across snapshots; verify field-id-based Parquet reads.view,macro,macro_impl,macro_parametersin DataFusion's catalog (at minimum, list and read views).tag/column_tagand expose via metadata APIs.files_scheduled_for_deletionso the read path skips soft-deleted files; honor on the write path during compaction.Definition of done
ducklakeDuckDB extension at v1.0 (DuckDB ≥ 1.5.2), exercising each feature above.datafusion-ducklakeare readable by the DuckDB extension.References