[epic] Full feature parity with DuckLake v1.0

## Summary

The DuckLake format reached v1.0 (production-ready, backward-compat guaranteed) on 2026-04-13 alongside DuckDB v1.5.2. The v1.0 catalog schema introduces new metadata tables and features beyond what `datafusion-ducklake` currently understands. This epic tracks closing that gap so a v1.0 catalog written by the `ducklake` DuckDB extension is fully readable and writable through DataFusion.

## Already covered

INSERT/DELETE writes, S3 write support, SQLite/MySQL/Postgres metadata backends, list/array types, PME-encrypted Parquet reads, byte-size statistics, snapshot isolation, basic schema evolution (type promotion, column rename). Data inlining + async metadata provider is on branch in #106 — land that rather than rebuild.

## Outstanding work

Sub-tasks to split out:

- **Catalog schema version validation.** Read the v1.0 schema version from `ducklake_metadata`; reject unknown versions with a clear error. Today the provider doesn't check at all.
- **Partitioning.** Honor `partition_column` / `partition_info` / `file_partition_value` for partition pruning. Support v1.0's `bucket(N, column)` transform (murmur3, Iceberg-compatible) alongside identity/year/month/day/hour. Flagged as TODO in CLAUDE.md.
- **Sorted tables.** Read `sort_info` / `sort_expression` so DataFusion can exploit ordering for filter and limit pushdown; preserve sort on writes.
- **Deletion vectors.** Read v1.0's Iceberg-v3-compatible deletion vectors (roaring bitmaps in Puffin files) in addition to the existing positional-delete-file path in `delete_filter.rs`. (Note: marked experimental in the v1.0 announcement.)
- **Column-level statistics.** Surface `file_column_stats` / `table_column_stats` / `table_stats` into DataFusion's `Statistics`. Builds on #112.
- **Struct and map types.** Currently error in `types.rs`. Precondition for nested `GEOMETRY` and `VARIANT`.
- **Geometry type.** Replace today's Binary mapping with proper `GEOMETRY`. Read bounding-box stats from `file_column_stats` for spatial filter pushdown. Support nesting inside structs/lists/maps.
- **Variant type.** Map `VARIANT` to Arrow; read `file_variant_stats` for shredded-sub-field file skipping.
- **Column mapping / name mapping.** Use `column_mapping` and `name_mapping` to resolve renamed/dropped/re-added columns across snapshots; verify field-id-based Parquet reads.
- **Time travel.** Expose user-facing snapshot selection (read at snapshot ID / timestamp). Snapshot pinning exists internally but isn't surfaced.
- **Views and macros.** Surface `view`, `macro`, `macro_impl`, `macro_parameters` in DataFusion's catalog (at minimum, list and read views).
- **Tags.** Read snapshot/table/column tags from `tag` / `column_tag` and expose via metadata APIs.
- **File lifecycle.** Respect `files_scheduled_for_deletion` so the read path skips soft-deleted files; honor on the write path during compaction.
- **Add existing Parquet without copy.** v1.0 supports registering pre-existing Parquet into the catalog without rewriting; expose on the write path.
- **UPDATE and ALTER TABLE.** Today's writes cover INSERT and DELETE; UPDATE and schema mutations (add/drop/rename column, etc.) are missing.

## Definition of done

- Round-trip parity tests against catalogs produced by the `ducklake` DuckDB extension at v1.0 (DuckDB ≥ 1.5.2), exercising each feature above.
- Reverse round-trip: catalogs written by `datafusion-ducklake` are readable by the DuckDB extension.
- README and CLAUDE.md updated to reflect v1.0 parity (drop the "Current Limitations" list; refresh the "DuckDB only" note).
- Catalog schema version negotiation: unknown versions are explicitly rejected with a clear error.

## References

- [DuckLake v1.0 release announcement](https://ducklake.select/2026/04/13/ducklake-10/)
- [DuckLake specification (stable / v1.0)](https://ducklake.select/docs/stable/specification/introduction)
- [DuckDB v1.5.2 announcement](https://duckdb.org/2026/04/13/announcing-duckdb-152)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[epic] Full feature parity with DuckLake v1.0 #114

Summary

Already covered

Outstanding work

Definition of done

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[epic] Full feature parity with DuckLake v1.0 #114

Description

Summary

Already covered

Outstanding work

Definition of done

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions