[storage/qmdb/current] fix bitmap batch parent chain growth with rwlock#3627
Conversation
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
commonware-mcp | 77af6e3 | Apr 21 2026, 09:32 PM |
98c3588 to
9f1e720
Compare
Deploying monorepo with
|
| Latest commit: |
77af6e3
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://f424fefc.monorepo-eu0.pages.dev |
| Branch Preview URL: | https://locked-bitmap.monorepo-eu0.pages.dev |
There was a problem hiding this comment.
Pull request overview
This PR refactors the Current QMDB committed bitmap representation to avoid unbounded layer-chain growth and expensive flatten-time cloning by switching the committed bitmap to a shared, in-place-mutable Arc<SharedBitmap<N>> (internally RwLock<BitMap<N>>), while keeping speculative overlays as batch layers.
Changes:
- Introduces
SharedBitmap<N>and updatesBitmapBatch<N>::Baseto referenceArc<SharedBitmap<N>>. - Refactors
Dbto storestatus: Arc<SharedBitmap<N>>and to apply overlays/prune/rewind by mutating the committed bitmap under a write lock. - Updates tests by removing
flatten()-focused cases and adding a regression test for extending an applied batch.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| storage/src/qmdb/current/sync/mod.rs | Initializes Db::status as Arc<SharedBitmap> during sync construction. |
| storage/src/qmdb/current/mod.rs | Initializes Db::status as Arc<SharedBitmap> and updates tests to remove flatten() reliance and cover post-apply extension. |
| storage/src/qmdb/current/db.rs | Refactors Db to use shared bitmap and apply/prune/rewind by in-place mutation under RwLock. |
| storage/src/qmdb/current/batch.rs | Adds SharedBitmap, updates batch/base bitmap plumbing, and removes apply_overlay/flatten. |
Comments suppressed due to low confidence (2)
storage/src/qmdb/current/db.rs:235
self.status.read()is a blockingRwLockReadGuard, but this guard is kept alive for the entireRangeProof::new_with_ops(...).await. Per utils/src/sync/mod.rs:19, blocking lock guards must not be held across.await. Prefer passingself.status.as_ref()(SharedBitmap implementsBitmapReadable) so each bitmap read briefly acquires/releases the lock, or use an async lock if you require a consistent snapshot across awaits.
let storage = self.grafted_storage();
let ops_root = self.any.log.root();
let guard = self.status.read();
RangeProof::new_with_ops(
hasher,
&*guard,
&storage,
&self.any.log,
start_loc,
max_ops,
ops_root,
)
.await
storage/src/qmdb/current/db.rs:534
- A blocking
RwLockReadGuardis held acrossbuild_grafted_tree(...).awaithere, which violates the repo’s async locking guidance (utils/src/sync/mod.rs:19: do not hold blocking guards across.await). Consider passingself.status.as_ref()directly (it implementsBitmapReadable) instead of a guard, or switch to an async lock if a consistent snapshot must span awaits.
let hasher = StandardHasher::<H>::new();
let guard = self.status.read();
let grafted_tree = build_grafted_tree::<F, H, N>(
&hasher,
&*guard,
&pinned_nodes,
&self.any.log.merkle,
self.thread_pool.as_ref(),
)
.await?;
35455c3 to
95a5731
Compare
|
bugbot run |
There was a problem hiding this comment.
Pull request overview
Refactors Current QMDB’s committed activity bitmap storage from an immutable layered BitmapBatch chain to a single Arc<SharedBitmap<N>> backed by a RwLock, so apply_batch/prune/rewind mutate the committed bitmap in place and avoid unbounded layer-chain growth and expensive flatten-time clones.
Changes:
- Introduce
SharedBitmap<N>(RwLock<BitMap<N>>) and make the committedDb::statusanArc<SharedBitmap<N>>. - Update
Db::apply_batch,prune, andrewindto mutate the shared bitmap under a write lock (and removeDb::flatten). - Refresh tests to cover live-batch behavior across prune and extending an already-applied batch.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| storage/src/qmdb/current/sync/mod.rs | Initialize Db with Arc<SharedBitmap<N>> for committed bitmap state. |
| storage/src/qmdb/current/mod.rs | Initialize Db with SharedBitmap; update tests to remove flatten assumptions and add regressions for the new shared-bitmap semantics. |
| storage/src/qmdb/current/db.rs | Replace layered bitmap mutation/flattening with in-place overlay application and lock-scoped bitmap pruning/rewinding; update proof/root paths to accept BitmapReadable. |
| storage/src/qmdb/current/batch.rs | Add SharedBitmap, update BitmapBatch::Base to reference it, and adjust overlay building to avoid repeated pruned_chunks() reads in hot loops. |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 95a5731. Configure here.
323b742 to
8de71e4
Compare
8de71e4 to
2595919
Compare
There was a problem hiding this comment.
Pull request overview
Refactors Current QMDB’s committed activity bitmap from an immutable layered BitmapBatch chain into a single shared SharedBitmap (Arc<RwLock<BitMap<N>>>) so apply_batch, prune, and rewind can mutate the committed bitmap in-place and avoid unbounded layer-chain growth and costly fallback cloning.
Changes:
- Introduces
SharedBitmap<N>and updatesDb.statustoArc<SharedBitmap<N>>, with proof paths reading viaBitmapReadable. - Reworks
Db::apply_batchto collect overlay Arcs, drop the batch, then apply overlays under one write lock; updatesprune/rewindsimilarly. - Removes
flattenusage/tests and adds new regression tests around prune + live batches and extending an applied batch.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| storage/src/qmdb/current/sync/mod.rs | Constructs Db with Arc<SharedBitmap<N>> during sync initialization. |
| storage/src/qmdb/current/mod.rs | Updates DB initialization and replaces flatten-focused tests with new shared-bitmap regression tests. |
| storage/src/qmdb/current/db.rs | Updates Db.status type, proof plumbing to use BitmapReadable, and in-place bitmap mutation in apply_batch/prune/rewind. |
| storage/src/qmdb/current/batch.rs | Adds SharedBitmap, updates BitmapBatch::Base to reference it, removes apply_overlay/flatten, and documents stale-read caveat. |
2595919 to
e68d29b
Compare
There was a problem hiding this comment.
Pull request overview
This PR refactors Current QMDB’s committed activity bitmap from an immutable, layer-stacking BitmapBatch base to a shared, in-place-mutated Arc<SharedBitmap<N>> guarded by a blocking RwLock, eliminating unbounded parent-chain growth and avoiding expensive bitmap cloning during flattening.
Changes:
- Introduces
SharedBitmap<N>(wrapper aroundRwLock<BitMap<N>>) and updates the committed bitmap to beArc<SharedBitmap<N>>. - Updates
Db::apply_batch,Db::prune, andDb::rewindto mutate the committed bitmap under a write lock (and removesDb::flatten/layer-collapsing behavior). - Adjusts proof/grafting helpers to accept
&impl BitmapReadable<N>and updates tests to cover the new shared-bitmap behavior and regressions.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| storage/src/qmdb/current/sync/mod.rs | Updates DB construction to store the committed bitmap as Arc<SharedBitmap<N>>. |
| storage/src/qmdb/current/mod.rs | Updates initialization similarly and replaces flatten-focused tests with shared-bitmap/prune/extend regression tests. |
| storage/src/qmdb/current/db.rs | Refactors DB state and bitmap application/prune/rewind flows to mutate the committed bitmap under a write lock; makes grafted-tree rebuild generic over BitmapReadable. |
| storage/src/qmdb/current/batch.rs | Adds SharedBitmap, updates BitmapBatch base to reference it, removes overlay-application/flattening APIs, and documents the stale-read caveat. |
4ced04c to
962450a
Compare
Before merging this, I think we need such a benchmark? |
16f087c to
32fc597
Compare
Benchmark added (also created a standalone PR for it here: #3633) |
992c028 to
094c8ee
Compare
|
^ PR to add some tests and tweak a few things but on the whole LGTM |
26982f6 to
4d62cd5
Compare
7cfd235 to
751d5ab
Compare
Shares the committed bitmap behind Arc<SharedBitmap<N>> so live batches can hold a reference while apply_batch mutates in place under the write lock. Removes Db::flatten, drops BitmapBatch's Layer chain in favor of apply-in-place, and adds valid_targets staleness checks.
751d5ab to
77af6e3
Compare
Codecov Report❌ Patch coverage is
@@ Coverage Diff @@
## main #3627 +/- ##
=======================================
Coverage 95.86% 95.86%
=======================================
Files 441 441
Lines 171196 171277 +81
Branches 4001 4000 -1
=======================================
+ Hits 164109 164187 +78
- Misses 5827 5832 +5
+ Partials 1260 1258 -2
... and 7 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Summary
Refactor the Current QMDB's committed bitmap from a layered
BitmapBatch<N>(anArc<BitMap>with a chain ofLayers on top) toArc<SharedBitmap<N>>whereSharedBitmapwrapsRwLock<BitMap<N>>.apply_batch,prune, andrewindnow mutate the committed bitmap in place under the write lock instead of stacking aLayer.Motivation
The old design had a memory/perf cliff:
apply_batchpushed a newLayeronDb::status; the chain only collapsed viaDb::flatten().flattenrequired unique ownership of the terminalArc<BitMap>. Whenever a liveMerkleizedBatchshared the base Arc (the common case — callers typically hold the batch they just applied),flattenfell back to a full(*arc).clone()of the bitmap. For bitmaps of tens/hundreds of MB, that's a silent, expensive memcpy.Mutating in place under a
RwLockbounds memory to the bitmap's actual live size and removes the flatten hazard entirely.Performance
Performance on existing benchmarks is unchanged. (There was no benchmark that results in the parent-chain-growth issue this change is addressing.). Performance improvement on a new "chained_growth" benchmark is as follows:
current::unordered::fixed::mmb chunk=32current::ordered::fixed::mmb chunk=32current::unordered::fixed::mmb chunk=256current::ordered::fixed::mmb chunk=256Design
SharedBitmap<N>(new)Thin wrapper around
RwLock<BitMap<N>>(parking_lot viacommonware_utils::sync::RwLock).read()returns aRwLockReadGuard<BitMap<N>>.write()ispub(super)— onlyDb::apply_batch,Db::prune,Db::rewindcan mutate.BitmapReadable<N>for proof-path reads.BitmapBatch<N>simplifiedRemoved
apply_overlayandflatten.Dbstatus: Arc<SharedBitmap<N>>(wasBitmapBatch<N>).apply_batchcollects overlayArcs from the batch chain, drops the batch, then applies overlays under a single write guard.pruneandrewindtake the write guard, mutate, drop before any.await.self.status.as_ref()(SharedBitmap implementsBitmapReadable).Db::flattenremoved.build_grafted_treeis now generic over&impl BitmapReadable<N>.Behavior differences:
In the old design, each
Arc<BitMap>was an immutable snapshot, so reading through a staleMerkleizedBatchreturned a hypothetical state — not matching the DB, but internally consistent. With theRwLockdesign, there's one live bitmap that evolves in place, so reads through a stale batch mix its overlays with post-divergence committed chunks, producing incoherent data. (In either case, the results are not useful.)Another behavior difference is: if a batch was built before
Db::pruneadvanced the committed bitmap's pruning boundary, and the caller later reads through that batch requesting a chunk at an index that's now pruned, the read panics insidePrunable::get_chunk's bounds assertion. Old design returned the frozen pre-prune snapshot (becauseArc::make_mutcloned the bitmap at prune time, leaving external batches with an unpruned copy). New design has nothing to clone, so the pruned index is a hard error.In practice this only triggers on misuse: library-internal paths (
build_chunk_overlay, proof generation,apply_batch's overlay write) already guard against reading pruned indices. But callers who reach intoBitmapBatch's chain directly after holding a batch across a prune get a panic rather than silent stale data. Thetest_current_live_batch_safe_across_pruneregression test confirms the common case (live batch across prune, apply after) stays panic-free.