Skip to content

Fix false transaction conflicts on commit retry#3

Open
fuziontech wants to merge 3 commits into
mainfrom
fix/stale-transaction-changes-on-retry
Open

Fix false transaction conflicts on commit retry#3
fuziontech wants to merge 3 commits into
mainfrom
fix/stale-transaction-changes-on-retry

Conversation

@fuziontech
Copy link
Copy Markdown
Member

Summary

  • Root cause: GetTransactionChanges() is computed once before the retry loop in FlushChanges(), but CommitChanges (via GetNewTableInfo) and WriteSnapshotChanges (via AddTableChanges) mutate it in-place by adding committed (remapped) table IDs. When the first commit attempt fails (duplicate snapshot_id race), these stale committed IDs persist into the retry.

  • False conflict mechanism: On retry, CheckForConflicts compares the stale IDs against the other transaction's committed changes. Both transactions derived their committed IDs from the same next_catalog_id counter, so the IDs collide — producing a false conflict like "insert into table while another transaction altered it" even when operating on entirely different tables.

  • Example scenario:

    • Tx A & Tx B both start from snapshot with next_catalog_id=25770
    • Tx A (first attempt): assigns committed table ID 25771WriteSnapshotChanges mutates tables_inserted_into += {25771} → Commit fails (Tx B committed first)
    • Tx B committed with altered_tables: {25771} (from COMMENT ON TABLE)
    • Tx A (retry): CheckForConflicts sees tables_inserted_into: {25771} (stale) vs altered_tables: {25771} (Tx B) → false conflict
  • Fix: Move GetTransactionChanges() inside the retry loop so it is recomputed fresh each iteration

Test plan

  • Added transaction_retry_stale_changes.test covering:
    • Concurrent INSERT + ALTER on different tables (both orderings)
    • Concurrent INSERT + COMMENT ON on different tables
    • All scenarios verify data correctness after both commits succeed

🤖 Generated with Claude Code

fuziontech and others added 3 commits March 26, 2026 15:55
…ion_changes

GetTransactionChanges() was computed once before the retry loop, but
CommitChanges (via GetNewTableInfo) and WriteSnapshotChanges (via
AddTableChanges) mutate it in-place by adding committed (remapped)
table IDs. When the first commit attempt fails (e.g., duplicate
snapshot_id race), these stale committed IDs persist into the retry.

On retry, CheckForConflicts compares the stale IDs against the
successfully-committed transaction's changes. Because both transactions
derived their IDs from the same next_catalog_id counter, the IDs
collide and produce a false conflict (e.g., "insert into table while
another transaction altered it") even when the transactions operated
on entirely different tables.

Fix: move GetTransactionChanges() inside the retry loop so it is
recomputed fresh each iteration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous tests used pre-existing tables with fixed IDs. Since
existing table IDs don't change across retries, they could never
collide with another transaction's committed IDs — the tests would
pass even without the fix.

The new tests use concurrent CTAS + ALTER operations. Both transactions
start from the same snapshot (same next_catalog_id), so they derive
identical committed IDs for their new tables. When the first attempt
fails, WriteSnapshotChanges has already added the stale committed ID
to tables_inserted_into. On retry, this collides with the other
transaction's altered_tables entry (same ID), triggering the false
conflict that this PR fixes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The COMMENT ON must be on the first committer so its altered_tables
entry collides with the retrying transaction's stale tables_inserted_into.
Previously it was on con2 (the retrier), meaning con1's altered_tables
was empty and no false conflict could trigger.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant