Skip to content

feat: support list/array column types in DuckLake type mapping#89

Merged
anoop-narang merged 3 commits into
mainfrom
feat/list-type-support
Mar 20, 2026
Merged

feat: support list/array column types in DuckLake type mapping#89
anoop-narang merged 3 commits into
mainfrom
feat/list-type-support

Conversation

@anoop-narang
Copy link
Copy Markdown
Collaborator

Summary

  • Add support for mapping DuckLake list types to Arrow DataType::List, handling list<type>, array<type>, and type[] (Postgres) syntax in ducklake_to_arrow_type() and the reverse in arrow_to_ducklake_type()
  • Reconstruct list types from parent-child ducklake_column rows in all four metadata providers (DuckDB, PostgreSQL, SQLite, MySQL) — the parent row's column_type is rewritten from bare "list" to "list<element_type>" and child rows are dropped
  • Struct/map parent types are intentionally left unchanged (they still error as before)

Context

DuckLake stores list columns as two rows in ducklake_column:

column_id column_name column_type parent_column
6 vector list NULL
7 element float64 6

Previously, get_table_structure() ignored parent_column, so vector got bare type "list" (which failed type mapping) and element appeared as a spurious top-level column.

Test plan

  • All 177 unit tests pass (cargo test --lib)
  • All integration tests pass (concurrent, delete filter, information schema, etc.)
  • New unit tests for reconstruct_list_columns() cover: basic list, no-list passthrough, struct parent unchanged, multiple lists, ColumnWithTable variant
  • New unit tests for type mapping cover: list<T>, array<T>, T[] syntax, empty element errors, nested complex type errors, roundtrip, schema building
  • Verified against live Postgres DuckLake catalog (table_id=2 with list column returns 7 columns, not 8)

Map list types to Arrow's DataType::List with recursive element type
resolution. Supports three syntaxes: list<type>, array<type>, and
type[] (Postgres style). Nested complex element types are rejected
with a clear error until struct/map support is added.
…data providers

DuckLake stores list columns as parent-child rows in ducklake_column
(parent has column_type="list", child has the element type with
parent_column pointing to the parent). All four metadata providers
now read parent_column, rewrite the parent's type to list<element_type>,
and drop the child row from the result.
@anoop-narang anoop-narang merged commit c460b25 into main Mar 20, 2026
3 checks passed
@anoop-narang anoop-narang deleted the feat/list-type-support branch March 20, 2026 07:48
anoop-narang added a commit that referenced this pull request May 20, 2026
…he DuckLake spec (#116)

* fix(writer): align ducklake_column / ducklake_data_file with the spec

`SqliteMetadataWriter` was producing a `ducklake_column` table that
lacked `parent_column`, `initial_default`, `default_value`,
`default_value_type`, and `default_value_dialect`. The reader's
SQL_GET_TABLE_COLUMNS query (added in PR #89) projects
`parent_column`, so any catalog written by `SqliteMetadataWriter` was
unreadable through the metadata provider, panicking with
"no such column: parent_column".

The bug shipped because CI ran `cargo test` without `write-sqlite`
features, and both `tests/sql_write_tests.rs` and
`tests/sqlite_metadata_provider_test.rs` are gated behind features
that CI didn't enable — so the broken-catalog path compiled to a
zero-test binary in CI and was invisible.

Three changes:

1. `metadata_writer_sqlite.rs::SQL_CREATE_SCHEMA` adds the five
   missing `ducklake_column` columns, matching the upstream
   `ducklake_metadata_manager.cpp` schema definition. We don't
   populate the four `*_default*` columns today (we don't support
   nested types or column defaults yet), but the columns must exist
   so the reader query doesn't error.

2. `tests/sqlite_metadata_provider_test.rs::init_schema` is brought
   in line with the same spec. It also gains `row_id_start`,
   `record_count`, and `mapping_id` on `ducklake_data_file` — these
   are projected by SQL_GET_DATA_FILES since the row-lineage work
   landed in PR #115 and any sqlite-fixture test that exercises the
   data-file read path would have hit "no such column" without them.

3. `.github/workflows/ci.yml` enables `write-sqlite` on the test
   step so `sql_write_tests` and `sqlite_metadata_provider_test`
   actually run. Without this, the next analogous schema drift will
   ship invisibly again.

Verified by running `cargo test --features "skip-tests-with-docker
write-sqlite"` locally — 228 unit tests + every integration suite
green, including the 7 previously-broken `sql_write_tests` and 18
`sqlite_metadata_provider_test` cases.

Out of scope (separately broken with the same root cause, no CI
coverage today): the encryption, postgres, and mysql test fixtures
have analogous spec gaps. Worth a follow-up once CI for those
backends is set up.

* docs(ci): reword the write-sqlite comment so the intent isn't ambiguous

The earlier wording read like "enable write-sqlite so the file runs"
which could be parsed as either "we previously didn't run them and
now we are" (correct) or "we want to skip running them" (wrong).
Rephrased to make clear we're adding features so the tests become
visible to cargo, not the other way round.

* docs: trim the new comments to just the non-obvious WHY

Three comment blocks added earlier in this branch were verbose,
duplicated reasoning across files, or carried a PR reference that
will rot. Trimmed each to the smallest signal a future reader needs:

* CI: 6 lines → 3. Keep the "without this, cargo compiles to a
  zero-test binary" hazard; drop the cfg-syntax tutorial.

* metadata_writer_sqlite.rs: 8 lines → 5. Spell out who reads each
  group of columns (our reader vs DuckDB) without re-explaining the
  same thing twice.

* sqlite_metadata_provider_test.rs: 7 lines across two blocks → 1.
  Replace the duplicated "why columns exist" lecture with a single
  pointer to the writer file. Also drops the "since PR #115"
  reference per the don't-cite-PR-numbers convention — the linkage
  is now in the writer file's comment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant