Skip to content

[v1.5] fix: add primary keys and indexes to metadata tables for concurrent write safety#14

Open
fuziontech wants to merge 2 commits into
v1.5-variegatafrom
v1.5-fix-add-primary-key-to-table-stats
Open

[v1.5] fix: add primary keys and indexes to metadata tables for concurrent write safety#14
fuziontech wants to merge 2 commits into
v1.5-variegatafrom
v1.5-fix-add-primary-key-to-table-stats

Conversation

@fuziontech
Copy link
Copy Markdown
Member

Cherry-picked from #1 to v1.5-variegata

fuziontech and others added 2 commits May 18, 2026 15:24
…rite safety

When using PostgreSQL as the metadata catalog with multiple concurrent
writers (e.g., Kafka Connect tasks), UPDATE/DELETE statements on tables
without primary keys cause postgres_scanner to use ctid (physical row ID)
for row identification. This leads to serialization failures because ctid
changes when rows are updated due to PostgreSQL's MVCC.

Tables that previously had no primary key now have one:

| Table | Primary Key |
|-------|-------------|
| ducklake_table_stats | table_id |
| ducklake_table_column_stats | (table_id, column_id) |
| ducklake_file_column_stats | (data_file_id, column_id) |
| ducklake_partition_info | partition_id |
| ducklake_partition_column | (partition_id, partition_key_index) |
| ducklake_file_partition_value | (data_file_id, partition_key_index) |
| ducklake_files_scheduled_for_deletion | data_file_id |
| ducklake_inlined_data_tables | (table_id, schema_version) |
| ducklake_column_mapping | mapping_id |
| ducklake_name_mapping | (mapping_id, column_id) |
| ducklake_macro_impl | (macro_id, impl_id) |
| ducklake_macro_parameters | (macro_id, impl_id, column_id) |

For frequently queried columns (especially table_id lookups):

- idx_data_file_table_snapshot: (table_id, begin_snapshot, end_snapshot)
- idx_delete_file_table_snapshot: (table_id, begin_snapshot, end_snapshot)
- idx_column_table: (table_id, end_snapshot)
- idx_file_column_stats_table: (table_id, column_id)
- idx_partition_info_table: (table_id)
- idx_partition_column_table: (table_id)
- idx_file_partition_value_table: (table_id)

This eliminates "could not serialize access due to concurrent update"
errors and improves query performance for table-scoped operations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant