feat: support list/array column types in DuckLake type mapping by anoop-narang · Pull Request #89 · datafusion-contrib/datafusion-ducklake

anoop-narang · 2026-03-20T07:21:48Z

Summary

Add support for mapping DuckLake list types to Arrow DataType::List, handling list<type>, array<type>, and type[] (Postgres) syntax in ducklake_to_arrow_type() and the reverse in arrow_to_ducklake_type()
Reconstruct list types from parent-child ducklake_column rows in all four metadata providers (DuckDB, PostgreSQL, SQLite, MySQL) — the parent row's column_type is rewritten from bare "list" to "list<element_type>" and child rows are dropped
Struct/map parent types are intentionally left unchanged (they still error as before)

Context

DuckLake stores list columns as two rows in ducklake_column:

column_id	column_name	column_type	parent_column
6	vector	list	NULL
7	element	float64	6

Previously, get_table_structure() ignored parent_column, so vector got bare type "list" (which failed type mapping) and element appeared as a spurious top-level column.

Test plan

All 177 unit tests pass (cargo test --lib)
All integration tests pass (concurrent, delete filter, information schema, etc.)
New unit tests for reconstruct_list_columns() cover: basic list, no-list passthrough, struct parent unchanged, multiple lists, ColumnWithTable variant
New unit tests for type mapping cover: list<T>, array<T>, T[] syntax, empty element errors, nested complex type errors, roundtrip, schema building
Verified against live Postgres DuckLake catalog (table_id=2 with list column returns 7 columns, not 8)

Map list types to Arrow's DataType::List with recursive element type resolution. Supports three syntaxes: list<type>, array<type>, and type[] (Postgres style). Nested complex element types are rejected with a clear error until struct/map support is added.

…data providers DuckLake stores list columns as parent-child rows in ducklake_column (parent has column_type="list", child has the element type with parent_column pointing to the parent). All four metadata providers now read parent_column, rewrite the parent's type to list<element_type>, and drop the child row from the result.

…he DuckLake spec (#116) * fix(writer): align ducklake_column / ducklake_data_file with the spec `SqliteMetadataWriter` was producing a `ducklake_column` table that lacked `parent_column`, `initial_default`, `default_value`, `default_value_type`, and `default_value_dialect`. The reader's SQL_GET_TABLE_COLUMNS query (added in PR #89) projects `parent_column`, so any catalog written by `SqliteMetadataWriter` was unreadable through the metadata provider, panicking with "no such column: parent_column". The bug shipped because CI ran `cargo test` without `write-sqlite` features, and both `tests/sql_write_tests.rs` and `tests/sqlite_metadata_provider_test.rs` are gated behind features that CI didn't enable — so the broken-catalog path compiled to a zero-test binary in CI and was invisible. Three changes: 1. `metadata_writer_sqlite.rs::SQL_CREATE_SCHEMA` adds the five missing `ducklake_column` columns, matching the upstream `ducklake_metadata_manager.cpp` schema definition. We don't populate the four `*_default*` columns today (we don't support nested types or column defaults yet), but the columns must exist so the reader query doesn't error. 2. `tests/sqlite_metadata_provider_test.rs::init_schema` is brought in line with the same spec. It also gains `row_id_start`, `record_count`, and `mapping_id` on `ducklake_data_file` — these are projected by SQL_GET_DATA_FILES since the row-lineage work landed in PR #115 and any sqlite-fixture test that exercises the data-file read path would have hit "no such column" without them. 3. `.github/workflows/ci.yml` enables `write-sqlite` on the test step so `sql_write_tests` and `sqlite_metadata_provider_test` actually run. Without this, the next analogous schema drift will ship invisibly again. Verified by running `cargo test --features "skip-tests-with-docker write-sqlite"` locally — 228 unit tests + every integration suite green, including the 7 previously-broken `sql_write_tests` and 18 `sqlite_metadata_provider_test` cases. Out of scope (separately broken with the same root cause, no CI coverage today): the encryption, postgres, and mysql test fixtures have analogous spec gaps. Worth a follow-up once CI for those backends is set up. * docs(ci): reword the write-sqlite comment so the intent isn't ambiguous The earlier wording read like "enable write-sqlite so the file runs" which could be parsed as either "we previously didn't run them and now we are" (correct) or "we want to skip running them" (wrong). Rephrased to make clear we're adding features so the tests become visible to cargo, not the other way round. * docs: trim the new comments to just the non-obvious WHY Three comment blocks added earlier in this branch were verbose, duplicated reasoning across files, or carried a PR reference that will rot. Trimmed each to the smallest signal a future reader needs: * CI: 6 lines → 3. Keep the "without this, cargo compiles to a zero-test binary" hazard; drop the cfg-syntax tutorial. * metadata_writer_sqlite.rs: 8 lines → 5. Spell out who reads each group of columns (our reader vs DuckDB) without re-explaining the same thing twice. * sqlite_metadata_provider_test.rs: 7 lines across two blocks → 1. Replace the duplicated "why columns exist" lecture with a single pointer to the writer file. Also drops the "since PR #115" reference per the don't-cite-PR-numbers convention — the linkage is now in the writer file's comment.

anoop-narang added 3 commits March 20, 2026 12:27

style: fix clippy and fmt warnings

ac6d231

anoop-narang merged commit c460b25 into main Mar 20, 2026
3 checks passed

anoop-narang deleted the feat/list-type-support branch March 20, 2026 07:48

anoop-narang mentioned this pull request May 20, 2026

fix(writer): align ducklake_column / ducklake_data_file schema with the DuckLake spec #116

Merged

3 tasks

This was referenced May 22, 2026

Foundation: rebase integration onto upstream, drop pass-throughs, triage SLT failures zfarrell/datafusion-ducklake#12

Open

Type system & inlined-data parsing (merge with upstream overlap) zfarrell/datafusion-ducklake#23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support list/array column types in DuckLake type mapping#89

feat: support list/array column types in DuckLake type mapping#89
anoop-narang merged 3 commits into
mainfrom
feat/list-type-support

anoop-narang commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anoop-narang commented Mar 20, 2026

Summary

Context

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant