Skip to content

[storage/index] simplify+optimize Cursor#3760

Open
roberto-bayardo wants to merge 6 commits into
mainfrom
unsafe-index-2
Open

[storage/index] simplify+optimize Cursor#3760
roberto-bayardo wants to merge 6 commits into
mainfrom
unsafe-index-2

Conversation

@roberto-bayardo
Copy link
Copy Markdown
Collaborator

@roberto-bayardo roberto-bayardo commented May 12, 2026

Summary

Updates storage::index cursor mutation to operate in place, avoiding the previous detach/reattach flow while preserving cursor semantics. Inserted collisions are now (typically) stored and returned in newest-first order, though contract continues to provide no ordering guarantee among colliding keys.

Changes

  • Reworked shared cursor state around prev / current pointers.
  • Reduced unsafe surface to one centralized mutable pointer dereference, with pointer creation restricted to &mut Record.
  • Fixed insert_and_prune so vacant entries are not created when the incoming value is already prunable.
  • Made remove drain owned collision chains directly while keeping metrics accurate.
  • Updated ordered, unordered, and partitioned index tests for newest-first value order.
  • Added regression coverage for prunable vacant inserts and cursor delete/insert edge cases.
  • Added insert_and_prune benchmark

Results

Benchmark Before After Change
index::insert_and_prune/items=10000 191.5 µs 77.8 µs -59.3%
index::insert_and_prune/items=50000 1.163 ms 479.1 µs -58.8%
Benchmark Time Change vs main
qmdb::apply_batch/direct 746 µs -14.0%
qmdb::apply_batch/uncomm_ancestor 2.39 ms -5.3%
qmdb::apply_batch/comm_ancestor 1.49 ms -6.5%
qmdb::apply_batch/comm_uncomm_chain 2.94 ms -15.7%
qmdb::apply_batch/multi_uncomm 3.71 ms -9.1%

@roberto-bayardo
Copy link
Copy Markdown
Collaborator Author

bugbot run

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 12, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
commonware-mcp 2105050 May 15 2026, 01:54 PM

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 12, 2026

Benchmark results

Tip

PASSED: No benchmark exceeded the regression threshold.

Benchmark comparison table
Benchmark Baseline (main) Current Delta Threshold Status
qmdb::merkleize/variant=any::unordered::fixed::mmr keys=10000 ch=false sync=false 1.528 ms 1.654 ms +8.25% 10.00% ✅ PASS
qmdb::merkleize/variant=current::ordered::fixed::mmb chunk=256 keys=10000 ch=true sync=false 2.856 ms 2.916 ms +2.09% 10.00% ✅ PASS

Baseline commit(s): ced07a033227

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 12, 2026

Deploying monorepo with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2105050
Status: ✅  Deploy successful!
Preview URL: https://e88680b1.monorepo-eu0.pages.dev
Branch Preview URL: https://unsafe-index-2.monorepo-eu0.pages.dev

View logs

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the shared index Cursor implementation to traverse and mutate the per-key linked list in place (using a simplified 3-state enum and pointer bookkeeping), removing the prior multi-phase “detach/reattach on Drop” approach. It also adjusts insertion/removal logic in the ordered/unordered index implementations to align with the new cursor behavior, and updates/adds tests (including a regression case for insert_and_prune on a vacant/pruned insert).

Changes:

  • Replaced the previous 7-variant cursor phase machine with an in-place cursor using State + NonNull<Record<V>> pointers.
  • Simplified remove() for ordered/unordered indexes by directly removing the map entry and iterating the record chain to update metrics.
  • Updated index tests to match the new iteration/insertion ordering, and added a regression test ensuring prunable insert_and_prune does not create an entry.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
storage/src/index/unordered.rs Updates insert/remove mechanics to work with the simplified in-place cursor and maintain metrics.
storage/src/index/storage.rs Replaces the cursor implementation with a pointer-based in-place traversal/mutation model and simplifies IndexEntry.
storage/src/index/partitioned/ordered.rs Updates ordered-partitioned index tests to reflect the new per-key value iteration ordering.
storage/src/index/ordered.rs Mirrors unordered index insert/remove changes for the BTree-backed implementation and updates related tests.
storage/src/index/mod.rs Updates cursor docs, provides a default Unordered::prune impl, and adjusts/adds tests for the new semantics.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit e749a19. Configure here.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment thread storage/src/index/storage.rs
Comment thread storage/src/index/storage.rs Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

@roberto-bayardo roberto-bayardo moved this to In Progress in Tracker May 12, 2026
@roberto-bayardo roberto-bayardo marked this pull request as ready for review May 12, 2026 15:56
@roberto-bayardo roberto-bayardo moved this from In Progress to Ready for Review in Tracker May 12, 2026
@roberto-bayardo roberto-bayardo force-pushed the unsafe-index-2 branch 2 times, most recently from 29459f5 to 9c9d4ae Compare May 12, 2026 16:30
@roberto-bayardo roberto-bayardo changed the title [storage/index] simplify Cursor [storage/index] simplify+optimize Cursor May 12, 2026
roberto-bayardo and others added 5 commits May 13, 2026 15:50
Replace the 7-variant Phase state machine with a simpler 3-state enum
and raw pointer traversal. The new Cursor operates on the linked list
in-place instead of detaching visited nodes into a "past" accumulator
and reattaching on Drop.

Also fixes a minor bug in previous impl and adds a regression test.
Exercises the full Cursor lifecycle (next, find, delete, insert, drop)
on every key, revealing the ~59% speedup from the Cursor simplification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@patrick-ogrady
Copy link
Copy Markdown
Contributor

patrick-ogrady commented May 15, 2026

Found an issue:

  This specific retained-suffix ordering bug came from this PR’s switch to insert_and_prune. Main
  already had related gap-fill logic, but this PR made older gap entries append after newer retained
  entries. The fix replays the gap in reverse with front insertion, preserving log order for both
  “all repeated writes in gap” and “older write in gap, newer write retained” cases.

+    /// Regression: after restart, the snapshot can contain the newer write for
+    /// a repeated key. If rewind restores an older write for that same key, the
+    /// older write must be checked first, matching the pre-restart snapshot.

Patch:

diff --git a/storage/src/qmdb/immutable/mod.rs b/storage/src/qmdb/immutable/mod.rs
index 165ff02f8..2e789c5d7 100644
--- a/storage/src/qmdb/immutable/mod.rs
+++ b/storage/src/qmdb/immutable/mod.rs
@@ -589,10 +589,12 @@ where
         if rewind_floor < old_floor {
             let reader = self.journal.journal.reader().await;
             let gap_end = core::cmp::min(*old_floor, rewind_size);
-            for loc in *rewind_floor..gap_end {
+            for loc in (*rewind_floor..gap_end).rev() {
                 if let Operation::Set(key, _) = reader.read(loc).await? {
-                    self.snapshot
-                        .insert_and_prune(&key, Location::new(loc), |_| false);
+                    let loc = Location::new(loc);
+                    if !self.snapshot.get(&key).any(|existing| *existing == loc) {
+                        self.snapshot.insert(&key, loc);
+                    }
                 }
             }
         }
@@ -2928,9 +2930,9 @@ pub(super) mod test {
         db.destroy().await.unwrap();
     }
 
-    /// Regression: rewind-after-reopen with a repeated key in the floor gap.
-    /// The gap-fill must maintain the same ordering as the live `apply_batch`
-    /// path so `get()` returns the same value.
+    /// Regression: rewind-after-reopen with a repeated key split across the
+    /// floor gap and retained suffix. The gap-fill must preserve log order
+    /// relative to entries that survived the higher-floor reopen.
     pub(crate) async fn test_immutable_rewind_after_reopen_repeated_key_gap<F: Family, V, C>(
         context: deterministic::Context,
         open_db: impl Fn(
@@ -2946,29 +2948,37 @@ pub(super) mod test {
         let key = Sha256::fill(7u8);
         let v1 = Sha256::fill(17u8);
         let v2 = Sha256::fill(18u8);
-        let k3 = Sha256::fill(8u8);
-        let v3 = Sha256::fill(19u8);
+        let marker = Sha256::fill(8u8);
+        let marker_value = Sha256::fill(19u8);
+        let k3 = Sha256::fill(9u8);
+        let v3 = Sha256::fill(20u8);
 
         // Commit A: Set(key, v1) with floor=0.
         commit_sets(&mut db, [(key, v1)], None).await;
 
-        // Commit B: Set(key, v2) with floor=0. get() returns v1 (earliest).
-        commit_sets(&mut db, [(key, v2)], None).await;
-        let second_size = db.bounds().await.end;
+        // Commit B: advances the target floor to key's first write.
+        commit_sets_with_floor(&mut db, [(marker, marker_value)], None, Location::new(1)).await;
+
+        // Commit C: Set(key, v2) at a location that will remain in the
+        // reopened snapshot. The rewind target still includes the older write.
+        commit_sets_with_floor(&mut db, [(key, v2)], None, Location::new(1)).await;
+        let target_size = db.bounds().await.end;
         assert_eq!(db.get(&key).await.unwrap(), Some(v1));
 
-        // Commit C: raises floor above both earlier writes.
-        commit_sets_with_floor(&mut db, [(k3, v3)], None, second_size).await;
+        // Commit D: raises the latest floor to key's second write, so reopen
+        // retains v2 but excludes v1.
+        commit_sets_with_floor(&mut db, [(k3, v3)], None, Location::new(5)).await;
         db.sync().await.unwrap();
 
-        // Reopen: snapshot rebuilt from floor=second_size, key excluded.
+        // Reopen: snapshot rebuilt from floor=5, so key resolves to v2.
         drop(db);
         let mut db = open_db(context.child("second")).await;
-        assert!(db.get(&key).await.unwrap().is_none());
+        assert_eq!(db.get(&key).await.unwrap(), Some(v2));
         assert_eq!(db.get(&k3).await.unwrap(), Some(v3));
 
-        // Rewind to commit B: gap fill re-inserts both Set(key,...) entries.
-        db.rewind(second_size).await.unwrap();
+        // Rewind to commit C: v2 is retained and v1 is restored from the gap.
+        // Reads must match the live target snapshot, where the earlier write wins.
+        db.rewind(target_size).await.unwrap();
         assert_eq!(db.get(&key).await.unwrap(), Some(v1));
 
         db.destroy().await.unwrap();

@roberto-bayardo
Copy link
Copy Markdown
Collaborator Author

roberto-bayardo commented May 15, 2026

Yeah this is the issue the QED bot turned up, looks like my fix wasn't a full fix though. (Admittedly I half hearted it since there really aren't supposed to be duplicate keys in the first place in an "immutable" db.). Any fix relying on insert() is going to be error prone since technically it doesn't guarantee the resulting order among collisions, but this patch is good enough for now (modulo one line noted below).

Found an issue:

  This specific retained-suffix ordering bug came from this PR’s switch to insert_and_prune. Main
  already had related gap-fill logic, but this PR made older gap entries append after newer retained
  entries. The fix replays the gap in reverse with front insertion, preserving log order for both
  “all repeated writes in gap” and “older write in gap, newer write retained” cases.

+    /// Regression: after restart, the snapshot can contain the newer write for
+    /// a repeated key. If rewind restores an older write for that same key, the
+    /// older write must be checked first, matching the pre-restart snapshot.

Patch:

diff --git a/storage/src/qmdb/immutable/mod.rs b/storage/src/qmdb/immutable/mod.rs
index 165ff02f8..2e789c5d7 100644
--- a/storage/src/qmdb/immutable/mod.rs
+++ b/storage/src/qmdb/immutable/mod.rs
@@ -589,10 +589,12 @@ where
         if rewind_floor < old_floor {
             let reader = self.journal.journal.reader().await;
             let gap_end = core::cmp::min(*old_floor, rewind_size);
-            for loc in *rewind_floor..gap_end {
+            for loc in (*rewind_floor..gap_end).rev() {
                 if let Operation::Set(key, _) = reader.read(loc).await? {
-                    self.snapshot
-                        .insert_and_prune(&key, Location::new(loc), |_| false);
+                    let loc = Location::new(loc);
+                    if !self.snapshot.get(&key).any(|existing| *existing == loc) {

^^ This line isn't needed (duplicate locations aren't allowed). but otherwise I've applied this patch, and added a distinct regression test rather than patch the original one.

+                        self.snapshot.insert(&key, loc);
+                    }

@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.77%. Comparing base (2cc9416) to head (2105050).
⚠️ Report is 10 commits behind head on main.

@@            Coverage Diff             @@
##             main    #3760      +/-   ##
==========================================
- Coverage   95.82%   95.77%   -0.05%     
==========================================
  Files         471      472       +1     
  Lines      187680   189352    +1672     
  Branches     4415     4569     +154     
==========================================
+ Hits       179851   181360    +1509     
- Misses       6404     6475      +71     
- Partials     1425     1517      +92     
Files with missing lines Coverage Δ
storage/src/index/mod.rs 99.76% <100.00%> (+<0.01%) ⬆️
storage/src/index/ordered.rs 100.00% <100.00%> (ø)
storage/src/index/partitioned/ordered.rs 100.00% <100.00%> (ø)
storage/src/index/storage.rs 100.00% <100.00%> (+1.05%) ⬆️
storage/src/index/unordered.rs 100.00% <100.00%> (ø)
storage/src/qmdb/immutable/fixed.rs 95.92% <100.00%> (+0.06%) ⬆️
storage/src/qmdb/immutable/mod.rs 97.61% <100.00%> (+0.06%) ⬆️
storage/src/qmdb/immutable/variable.rs 95.70% <100.00%> (+0.07%) ⬆️

... and 57 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2cc9416...2105050. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Ready for Review

Development

Successfully merging this pull request may close these issues.

3 participants