feat: support list/array column types in DuckLake type mapping#89
Merged
Conversation
Map list types to Arrow's DataType::List with recursive element type resolution. Supports three syntaxes: list<type>, array<type>, and type[] (Postgres style). Nested complex element types are rejected with a clear error until struct/map support is added.
…data providers DuckLake stores list columns as parent-child rows in ducklake_column (parent has column_type="list", child has the element type with parent_column pointing to the parent). All four metadata providers now read parent_column, rewrite the parent's type to list<element_type>, and drop the child row from the result.
3 tasks
anoop-narang
added a commit
that referenced
this pull request
May 20, 2026
…he DuckLake spec (#116) * fix(writer): align ducklake_column / ducklake_data_file with the spec `SqliteMetadataWriter` was producing a `ducklake_column` table that lacked `parent_column`, `initial_default`, `default_value`, `default_value_type`, and `default_value_dialect`. The reader's SQL_GET_TABLE_COLUMNS query (added in PR #89) projects `parent_column`, so any catalog written by `SqliteMetadataWriter` was unreadable through the metadata provider, panicking with "no such column: parent_column". The bug shipped because CI ran `cargo test` without `write-sqlite` features, and both `tests/sql_write_tests.rs` and `tests/sqlite_metadata_provider_test.rs` are gated behind features that CI didn't enable — so the broken-catalog path compiled to a zero-test binary in CI and was invisible. Three changes: 1. `metadata_writer_sqlite.rs::SQL_CREATE_SCHEMA` adds the five missing `ducklake_column` columns, matching the upstream `ducklake_metadata_manager.cpp` schema definition. We don't populate the four `*_default*` columns today (we don't support nested types or column defaults yet), but the columns must exist so the reader query doesn't error. 2. `tests/sqlite_metadata_provider_test.rs::init_schema` is brought in line with the same spec. It also gains `row_id_start`, `record_count`, and `mapping_id` on `ducklake_data_file` — these are projected by SQL_GET_DATA_FILES since the row-lineage work landed in PR #115 and any sqlite-fixture test that exercises the data-file read path would have hit "no such column" without them. 3. `.github/workflows/ci.yml` enables `write-sqlite` on the test step so `sql_write_tests` and `sqlite_metadata_provider_test` actually run. Without this, the next analogous schema drift will ship invisibly again. Verified by running `cargo test --features "skip-tests-with-docker write-sqlite"` locally — 228 unit tests + every integration suite green, including the 7 previously-broken `sql_write_tests` and 18 `sqlite_metadata_provider_test` cases. Out of scope (separately broken with the same root cause, no CI coverage today): the encryption, postgres, and mysql test fixtures have analogous spec gaps. Worth a follow-up once CI for those backends is set up. * docs(ci): reword the write-sqlite comment so the intent isn't ambiguous The earlier wording read like "enable write-sqlite so the file runs" which could be parsed as either "we previously didn't run them and now we are" (correct) or "we want to skip running them" (wrong). Rephrased to make clear we're adding features so the tests become visible to cargo, not the other way round. * docs: trim the new comments to just the non-obvious WHY Three comment blocks added earlier in this branch were verbose, duplicated reasoning across files, or carried a PR reference that will rot. Trimmed each to the smallest signal a future reader needs: * CI: 6 lines → 3. Keep the "without this, cargo compiles to a zero-test binary" hazard; drop the cfg-syntax tutorial. * metadata_writer_sqlite.rs: 8 lines → 5. Spell out who reads each group of columns (our reader vs DuckDB) without re-explaining the same thing twice. * sqlite_metadata_provider_test.rs: 7 lines across two blocks → 1. Replace the duplicated "why columns exist" lecture with a single pointer to the writer file. Also drops the "since PR #115" reference per the don't-cite-PR-numbers convention — the linkage is now in the writer file's comment.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DataType::List, handlinglist<type>,array<type>, andtype[](Postgres) syntax inducklake_to_arrow_type()and the reverse inarrow_to_ducklake_type()ducklake_columnrows in all four metadata providers (DuckDB, PostgreSQL, SQLite, MySQL) — the parent row'scolumn_typeis rewritten from bare"list"to"list<element_type>"and child rows are droppedContext
DuckLake stores list columns as two rows in
ducklake_column:Previously,
get_table_structure()ignoredparent_column, sovectorgot bare type"list"(which failed type mapping) andelementappeared as a spurious top-level column.Test plan
cargo test --lib)reconstruct_list_columns()cover: basic list, no-list passthrough, struct parent unchanged, multiple lists,ColumnWithTablevariantlist<T>,array<T>,T[]syntax, empty element errors, nested complex type errors, roundtrip, schema building