fix(writer): align ducklake_column / ducklake_data_file schema with the DuckLake spec#116
Merged
Merged
Conversation
`SqliteMetadataWriter` was producing a `ducklake_column` table that lacked `parent_column`, `initial_default`, `default_value`, `default_value_type`, and `default_value_dialect`. The reader's SQL_GET_TABLE_COLUMNS query (added in PR #89) projects `parent_column`, so any catalog written by `SqliteMetadataWriter` was unreadable through the metadata provider, panicking with "no such column: parent_column". The bug shipped because CI ran `cargo test` without `write-sqlite` features, and both `tests/sql_write_tests.rs` and `tests/sqlite_metadata_provider_test.rs` are gated behind features that CI didn't enable — so the broken-catalog path compiled to a zero-test binary in CI and was invisible. Three changes: 1. `metadata_writer_sqlite.rs::SQL_CREATE_SCHEMA` adds the five missing `ducklake_column` columns, matching the upstream `ducklake_metadata_manager.cpp` schema definition. We don't populate the four `*_default*` columns today (we don't support nested types or column defaults yet), but the columns must exist so the reader query doesn't error. 2. `tests/sqlite_metadata_provider_test.rs::init_schema` is brought in line with the same spec. It also gains `row_id_start`, `record_count`, and `mapping_id` on `ducklake_data_file` — these are projected by SQL_GET_DATA_FILES since the row-lineage work landed in PR #115 and any sqlite-fixture test that exercises the data-file read path would have hit "no such column" without them. 3. `.github/workflows/ci.yml` enables `write-sqlite` on the test step so `sql_write_tests` and `sqlite_metadata_provider_test` actually run. Without this, the next analogous schema drift will ship invisibly again. Verified by running `cargo test --features "skip-tests-with-docker write-sqlite"` locally — 228 unit tests + every integration suite green, including the 7 previously-broken `sql_write_tests` and 18 `sqlite_metadata_provider_test` cases. Out of scope (separately broken with the same root cause, no CI coverage today): the encryption, postgres, and mysql test fixtures have analogous spec gaps. Worth a follow-up once CI for those backends is set up.
The earlier wording read like "enable write-sqlite so the file runs" which could be parsed as either "we previously didn't run them and now we are" (correct) or "we want to skip running them" (wrong). Rephrased to make clear we're adding features so the tests become visible to cargo, not the other way round.
Three comment blocks added earlier in this branch were verbose, duplicated reasoning across files, or carried a PR reference that will rot. Trimmed each to the smallest signal a future reader needs: * CI: 6 lines → 3. Keep the "without this, cargo compiles to a zero-test binary" hazard; drop the cfg-syntax tutorial. * metadata_writer_sqlite.rs: 8 lines → 5. Spell out who reads each group of columns (our reader vs DuckDB) without re-explaining the same thing twice. * sqlite_metadata_provider_test.rs: 7 lines across two blocks → 1. Replace the duplicated "why columns exist" lecture with a single pointer to the writer file. Also drops the "since PR #115" reference per the don't-cite-PR-numbers convention — the linkage is now in the writer file's comment.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SqliteMetadataWriterwas creating aducklake_columntable missing five columns the upstream DuckLake spec defines (notablyparent_column). Since PR #89 addedparent_columnto the reader'sSQL_GET_TABLE_COLUMNSquery, any catalog written bySqliteMetadataWriterhas been unreadable through the metadata provider, panicking withno such column: parent_column.This shipped to main because CI ran
cargo testwithout enablingwrite-sqlite/metadata-sqlite, so the two test suites that would have caught it (tests/sql_write_tests.rs,tests/sqlite_metadata_provider_test.rs) compiled to zero-test binaries.Changes
src/metadata_writer_sqlite.rs::SQL_CREATE_SCHEMA— addparent_column,initial_default,default_value,default_value_type,default_value_dialectto theducklake_columnCREATE TABLE. Matches the upstream schema inducklake_metadata_manager.cpp. We don't populate the four*_default*columns today (we don't write nested types or column defaults), but they have to exist so the reader's query doesn't error.tests/sqlite_metadata_provider_test.rs::init_schema— bring the in-test fixture in line with the same spec. Also addsrow_id_start,record_count, andmapping_idonducklake_data_filesinceSQL_GET_DATA_FILESprojects them after the row-lineage work in PR feat: DuckLake row lineage (rowid virtual column) #115..github/workflows/ci.yml— enablewrite-sqliteon the test step. Without this, the next analogous schema drift ships invisibly again.Test plan
cargo test --features "skip-tests-with-docker write-sqlite"— green across all suites. The 7sql_write_tests(including the previously-brokentest_insert_into_read_only_fails) and the 18sqlite_metadata_provider_testcases all pass.cargo clippy --all-targets --all-features --no-deps -- -D warnings— clean.cargo fmt --all -- --check— clean.Out of scope (followups)
The encryption, postgres, and mysql test fixtures (
tests/encryption_tests.rs,tests/postgres_metadata_provider_test.rs,tests/mysql_metadata_provider_test.rs) have analogous spec gaps — same missing columns onducklake_column. They aren't blocking CI today because their features are still not enabled there. Worth a follow-up once those backends are set up in CI.