You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: schema DDL sync — ADD/DROP/RENAME COLUMN, block ALTER TYPE (#39)
## Summary
- **Automatic DDL propagation** from source to DuckLake target tables
for ADD COLUMN, DROP COLUMN, and RENAME COLUMN
- **Detection via RELATION message diffing** — pgoutput sends a RELATION
message before the first DML after a schema change; comparing with the
cached entry detects column additions, removals, and renames
- **Non-blocking DDL barrier queue** — DDL is a `PendingDdl` in a
per-table `VecDeque`; the WAL consumer sets the barrier and continues
immediately while the flush thread drains old-schema data, applies ALTER
TABLE via a short-lived PG connection, then resumes with the new schema.
Multiple barriers are queued (not merged) so each batch is processed
with the correct column layout.
- **OID-based target resolution** — `target_oid` stored in
`table_mappings`; `apply_ddl_commands()` resolves the current target
name from `pg_class` so the pipeline survives user-initiated target
table renames
- **Source rename does NOT rename target** — source table renames update
`source_schema`/`source_table` metadata only; target table name stays
unchanged
- **ALTER COLUMN TYPE blocked** — same-name-different-OID columns are
detected and the table is transitioned to ERRORED immediately
(threshold=1) with a resync hint, preventing silent data corruption from
stale column types
- **Fix: preserve error_message in ERRORED state** —
`clear_error_on_success()` now skips tables in ERRORED state so a
subsequent successful flush doesn't wipe the diagnostic message
- **New types**: `DdlCommand` enum (with `UnsupportedAlterColumnType`
variant), `PendingDdl` struct
- **Metadata helpers**: `get_column_type()`, `update_source_name()`,
`update_target_oid()`, `pg_oid_to_type_name()`
## Test plan
- [x] `make check-regression TEST=ddl_sync` — ADD COLUMN, DROP COLUMN,
RENAME COLUMN propagate correctly; multi-DDL barrier queuing works;
ALTER COLUMN TYPE errors the pipeline
- [x] `make check-regression TEST=rename_table` — source rename does NOT
rename target, metadata updated, UPDATE/DELETE work after rename
- [x] `make installcheck` — all 42 tests pass
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- TRUNCATE: uses `drain_and_wait_table()` for per-table synchronous drain before DELETE
664
664
- Crash safety: replication slot only advances past what all tables have durably flushed (confirmed_lsn = min(applied_lsn) read from PG)
665
665
666
+
### 5.1. Schema DDL Sync
667
+
668
+
Schema changes (ADD/DROP/RENAME COLUMN) are automatically propagated from source to DuckLake target tables. Source table renames are tracked (metadata updated) but do **not** rename the target — the target table name is stable and independent of the source name.
669
+
670
+
**Detection** — pgoutput sends a RELATION message before the first DML after a schema change. The `'R'` handler in `process_one_wal_message()` compares the new RELATION with the cached entry via `detect_schema_changes()`:
671
+
672
+
-**ADD COLUMN**: column names in new but not in old → query `pg_attribute` for type
673
+
-**DROP COLUMN**: column names in old but not in new
674
+
-**RENAME COLUMN**: same position, different name, same type OID
675
+
676
+
**Target resolution** — `apply_ddl_commands()` resolves the current target table name from `target_oid` via `pg_class` rather than using a stored name string. This means the pipeline survives user-initiated renames of the target table. For old mappings without `target_oid`, it falls back to the name stored in metadata.
677
+
678
+
**Propagation** — DDL is treated as a non-blocking barrier event in the per-table queue (`PendingDdl`). The WAL consumer sets the barrier and continues immediately; the flush thread handles it autonomously:
679
+
680
+
1. WAL consumer detects schema diff → calls `coordinator.set_pending_ddl()` with `DdlCommand` list and new `QueueMeta`
681
+
2. While barrier is set, `push_change()` routes new-schema changes to `pending_after_ddl`
682
+
3. Flush thread drains and flushes old-schema changes from the buffer
683
+
4. Flush thread applies `ALTER TABLE` commands to the DuckLake target via a short-lived PG connection (`apply_ddl_commands()`)
684
+
5. Flush thread resets `FlushWorker`, merges `pending_after_ddl` into main queue with updated `QueueMeta`, continues normally
685
+
686
+
The barrier ensures old-schema data is flushed before ALTER TABLE, preventing column mismatch errors or data loss.
0 commit comments