Skip to content

fix(writer): align ducklake_column / ducklake_data_file schema with the DuckLake spec#116

Merged
anoop-narang merged 3 commits into
mainfrom
fix/writer-ducklake-column-schema
May 20, 2026
Merged

fix(writer): align ducklake_column / ducklake_data_file schema with the DuckLake spec#116
anoop-narang merged 3 commits into
mainfrom
fix/writer-ducklake-column-schema

Conversation

@anoop-narang
Copy link
Copy Markdown
Collaborator

Summary

SqliteMetadataWriter was creating a ducklake_column table missing five columns the upstream DuckLake spec defines (notably parent_column). Since PR #89 added parent_column to the reader's SQL_GET_TABLE_COLUMNS query, any catalog written by SqliteMetadataWriter has been unreadable through the metadata provider, panicking with no such column: parent_column.

This shipped to main because CI ran cargo test without enabling write-sqlite / metadata-sqlite, so the two test suites that would have caught it (tests/sql_write_tests.rs, tests/sqlite_metadata_provider_test.rs) compiled to zero-test binaries.

Changes

  1. src/metadata_writer_sqlite.rs::SQL_CREATE_SCHEMA — add parent_column, initial_default, default_value, default_value_type, default_value_dialect to the ducklake_column CREATE TABLE. Matches the upstream schema in ducklake_metadata_manager.cpp. We don't populate the four *_default* columns today (we don't write nested types or column defaults), but they have to exist so the reader's query doesn't error.

  2. tests/sqlite_metadata_provider_test.rs::init_schema — bring the in-test fixture in line with the same spec. Also adds row_id_start, record_count, and mapping_id on ducklake_data_file since SQL_GET_DATA_FILES projects them after the row-lineage work in PR feat: DuckLake row lineage (rowid virtual column) #115.

  3. .github/workflows/ci.yml — enable write-sqlite on the test step. Without this, the next analogous schema drift ships invisibly again.

Test plan

  • cargo test --features "skip-tests-with-docker write-sqlite" — green across all suites. The 7 sql_write_tests (including the previously-broken test_insert_into_read_only_fails) and the 18 sqlite_metadata_provider_test cases all pass.
  • cargo clippy --all-targets --all-features --no-deps -- -D warnings — clean.
  • cargo fmt --all -- --check — clean.

Out of scope (followups)

The encryption, postgres, and mysql test fixtures (tests/encryption_tests.rs, tests/postgres_metadata_provider_test.rs, tests/mysql_metadata_provider_test.rs) have analogous spec gaps — same missing columns on ducklake_column. They aren't blocking CI today because their features are still not enabled there. Worth a follow-up once those backends are set up in CI.

`SqliteMetadataWriter` was producing a `ducklake_column` table that
lacked `parent_column`, `initial_default`, `default_value`,
`default_value_type`, and `default_value_dialect`. The reader's
SQL_GET_TABLE_COLUMNS query (added in PR #89) projects
`parent_column`, so any catalog written by `SqliteMetadataWriter` was
unreadable through the metadata provider, panicking with
"no such column: parent_column".

The bug shipped because CI ran `cargo test` without `write-sqlite`
features, and both `tests/sql_write_tests.rs` and
`tests/sqlite_metadata_provider_test.rs` are gated behind features
that CI didn't enable — so the broken-catalog path compiled to a
zero-test binary in CI and was invisible.

Three changes:

1. `metadata_writer_sqlite.rs::SQL_CREATE_SCHEMA` adds the five
   missing `ducklake_column` columns, matching the upstream
   `ducklake_metadata_manager.cpp` schema definition. We don't
   populate the four `*_default*` columns today (we don't support
   nested types or column defaults yet), but the columns must exist
   so the reader query doesn't error.

2. `tests/sqlite_metadata_provider_test.rs::init_schema` is brought
   in line with the same spec. It also gains `row_id_start`,
   `record_count`, and `mapping_id` on `ducklake_data_file` — these
   are projected by SQL_GET_DATA_FILES since the row-lineage work
   landed in PR #115 and any sqlite-fixture test that exercises the
   data-file read path would have hit "no such column" without them.

3. `.github/workflows/ci.yml` enables `write-sqlite` on the test
   step so `sql_write_tests` and `sqlite_metadata_provider_test`
   actually run. Without this, the next analogous schema drift will
   ship invisibly again.

Verified by running `cargo test --features "skip-tests-with-docker
write-sqlite"` locally — 228 unit tests + every integration suite
green, including the 7 previously-broken `sql_write_tests` and 18
`sqlite_metadata_provider_test` cases.

Out of scope (separately broken with the same root cause, no CI
coverage today): the encryption, postgres, and mysql test fixtures
have analogous spec gaps. Worth a follow-up once CI for those
backends is set up.
The earlier wording read like "enable write-sqlite so the file runs"
which could be parsed as either "we previously didn't run them and
now we are" (correct) or "we want to skip running them" (wrong).
Rephrased to make clear we're adding features so the tests become
visible to cargo, not the other way round.
Three comment blocks added earlier in this branch were verbose,
duplicated reasoning across files, or carried a PR reference that
will rot. Trimmed each to the smallest signal a future reader needs:

* CI: 6 lines → 3. Keep the "without this, cargo compiles to a
  zero-test binary" hazard; drop the cfg-syntax tutorial.

* metadata_writer_sqlite.rs: 8 lines → 5. Spell out who reads each
  group of columns (our reader vs DuckDB) without re-explaining the
  same thing twice.

* sqlite_metadata_provider_test.rs: 7 lines across two blocks → 1.
  Replace the duplicated "why columns exist" lecture with a single
  pointer to the writer file. Also drops the "since PR #115"
  reference per the don't-cite-PR-numbers convention — the linkage
  is now in the writer file's comment.
@anoop-narang anoop-narang merged commit 9378de8 into main May 20, 2026
2 of 3 checks passed
@anoop-narang anoop-narang deleted the fix/writer-ducklake-column-schema branch May 20, 2026 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant