Commit a6a7ae8
refactor(anvil): single-mutex QueueState replaces seven atomics (#88)
We've spent the last several PRs (#84, #86, #87) chasing variants of the
same race: independent in-memory atomics modified by writers in some
order, read by other writers/readers in some order, with ad-hoc
Acquire/Release pairings. Each PR fixed one (writer-pair, reader-pair)
combination; the next PR turned up another. There are O(N²) such pairs
to get right, and we kept finding new ones in production.
This commit collapses the seven atomics + separate commit-log mutex
into a single `tokio::sync::Mutex<QueueState>` (actually `std::sync::Mutex`
— we never await under the lock).
What changes:
- `QueueCounters` now wraps a single `state: Mutex<QueueState>` instead
of 7 individual `AtomicU64`s and a separate `BTreeMap` mutex.
- `QueueState` holds `push_seq_committed`, `push_seq_alloc`, `claim_seq`,
the four lifetime totals, and the `commit_log` for in-flight
reservations. Two helper methods (`pending_count`, `claimed_count`)
encapsulate the derived values.
- Every writer (push, nack, claim, ack, ack_and_forward, ack_and_scatter)
acquires the lock once for the in-memory mutation step, drops it,
then runs `db.write` unlocked. Concurrent writers' batch commits
still execute in parallel.
- `PushReservationGuard::Drop` acquires the same mutex to flip its
entry to done and walk the contiguous-committed prefix of
`commit_log`. RAII closes every exit path.
- `check_queue_completion` and `get_meta` take a single locked
snapshot. **Torn snapshots are now impossible by construction.**
- `ack_and_scatter` (which previously bypassed the reservation pattern
and bumped `push_seq` directly — exposing it to the publish-commit
race) now goes through `reserve_push_range` like every other writer.
What this fixes by construction:
- Publish-commit race (#84): unchanged by this commit; the watermark
+ commit-log invariant survives.
- Reservation leak (#84): RAII guard kept; same Drop semantics.
- Drained ignores in-flight (#86): `pending_count()` reads from
the same `QueueState` snapshot used everywhere else.
- Counter snapshot torn read (#86): impossible — single lock.
- Claim/check write order (#87): impossible — single lock.
- Phantom claim over-bump (#87 followup): handled in `claim_messages`
with a second locked critical section that undoes the over-bump
when `actual_count < reserved`.
Critical sections are sub-µs (struct field updates / brief BTreeMap
work). `db.write` runs unlocked. Throughput envelope per queue is now
bounded by the storage layer (~10K ops/s), not by the lock (~20M
ops/s ceiling).
Storage suite 24/24 green locally. DST suite running.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 93e0a4b commit a6a7ae8
1 file changed
Lines changed: 218 additions & 192 deletions
0 commit comments