Skip to content

Commit a6a7ae8

Browse files
jiaoew1991claude
andauthored
refactor(anvil): single-mutex QueueState replaces seven atomics (#88)
We've spent the last several PRs (#84, #86, #87) chasing variants of the same race: independent in-memory atomics modified by writers in some order, read by other writers/readers in some order, with ad-hoc Acquire/Release pairings. Each PR fixed one (writer-pair, reader-pair) combination; the next PR turned up another. There are O(N²) such pairs to get right, and we kept finding new ones in production. This commit collapses the seven atomics + separate commit-log mutex into a single `tokio::sync::Mutex<QueueState>` (actually `std::sync::Mutex` — we never await under the lock). What changes: - `QueueCounters` now wraps a single `state: Mutex<QueueState>` instead of 7 individual `AtomicU64`s and a separate `BTreeMap` mutex. - `QueueState` holds `push_seq_committed`, `push_seq_alloc`, `claim_seq`, the four lifetime totals, and the `commit_log` for in-flight reservations. Two helper methods (`pending_count`, `claimed_count`) encapsulate the derived values. - Every writer (push, nack, claim, ack, ack_and_forward, ack_and_scatter) acquires the lock once for the in-memory mutation step, drops it, then runs `db.write` unlocked. Concurrent writers' batch commits still execute in parallel. - `PushReservationGuard::Drop` acquires the same mutex to flip its entry to done and walk the contiguous-committed prefix of `commit_log`. RAII closes every exit path. - `check_queue_completion` and `get_meta` take a single locked snapshot. **Torn snapshots are now impossible by construction.** - `ack_and_scatter` (which previously bypassed the reservation pattern and bumped `push_seq` directly — exposing it to the publish-commit race) now goes through `reserve_push_range` like every other writer. What this fixes by construction: - Publish-commit race (#84): unchanged by this commit; the watermark + commit-log invariant survives. - Reservation leak (#84): RAII guard kept; same Drop semantics. - Drained ignores in-flight (#86): `pending_count()` reads from the same `QueueState` snapshot used everywhere else. - Counter snapshot torn read (#86): impossible — single lock. - Claim/check write order (#87): impossible — single lock. - Phantom claim over-bump (#87 followup): handled in `claim_messages` with a second locked critical section that undoes the over-bump when `actual_count < reserved`. Critical sections are sub-µs (struct field updates / brief BTreeMap work). `db.write` runs unlocked. Throughput envelope per queue is now bounded by the storage layer (~10K ops/s), not by the lock (~20M ops/s ceiling). Storage suite 24/24 green locally. DST suite running. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 93e0a4b commit a6a7ae8

1 file changed

Lines changed: 218 additions & 192 deletions

File tree

0 commit comments

Comments
 (0)