[RangeIndex] Add bf-tree recover-then-drop crash repros#1889
Open
tiagonapoli wants to merge 1 commit into
Open
[RangeIndex] Add bf-tree recover-then-drop crash repros#1889tiagonapoli wants to merge 1 commit into
tiagonapoli wants to merge 1 commit into
Conversation
Contributor
There was a problem hiding this comment.
⚠️ Not ready to approve
The new explicit tests leak temp directories on the passing control path and should ensure native resources are disposed even when assertions fail.
Pull request overview
Adds explicit, crash-by-design reproduction coverage for a native bf-tree 0.5.0 recover-then-drop abort affecting RangeIndex, to make the failure deterministic and easier to track while a fix is developed. The PR title and description accurately match the implementation and clearly document the failure mode and why the tests are [Explicit].
Changes:
- Add
[Explicit]RESP-level repro that checkpoints (SAVE), recovers, then triggers the abort on tree drop during server dispose. - Add
[Explicit]pure-native repro usingBfTreeServicesnapshot → recover → drop. - Add
[Explicit]below-threshold control test showing single-level trees recover/drop cleanly.
File summaries
| File | Description |
|---|---|
| test/standalone/Garnet.test.rangeindex/RespRangeIndexTests.cs | Adds three [Explicit] crash repro/control tests for bf-tree CPR snapshot recovery followed by drop. |
Copilot's findings
- Files reviewed: 1/1 changed files
- Comments generated: 2
Note
Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.
Comment on lines
+1204
to
+1211
| [Test] | ||
| [Explicit("Control: a below-threshold (single-level) recovered tree drops cleanly (no crash).")] | ||
| public void BfTreeRecoverFromCprSnapshotThenDrop_BelowThreshold_NoCrash() | ||
| { | ||
| var dir = Path.Combine(Path.GetTempPath(), $"bftree_cprrepro_{Guid.NewGuid():N}"); | ||
| Directory.CreateDirectory(dir); | ||
|
|
||
| // Seed: only 50 tiny records -> stays single-level, then CprSnapshot. |
Comment on lines
+1177
to
+1189
| // Seed: disk-backed tree with 1000 tiny 4-byte records -> multi-level, then CprSnapshot. | ||
| var seed = new BfTreeService(StorageBackendType.Disk, | ||
| Path.Combine(dir, "seed.data.bftree"), Path.Combine(dir, "seed.scratch.cpr"), | ||
| cbSizeByte: 65536, cbMinRecordSize: 8); | ||
| for (var i = 0; i < 1000; i++) | ||
| { | ||
| var bytes = Encoding.ASCII.GetBytes(i.ToString("D4")); | ||
| ClassicAssert.AreEqual(BfTreeInsertResult.Success, seed.Insert(bytes, bytes), $"insert {i} should succeed"); | ||
| } | ||
| seed.CprSnapshot(); | ||
| var snapshot = Path.Combine(dir, "snap.bftree"); | ||
| File.Copy(Path.Combine(dir, "seed.scratch.cpr"), snapshot, overwrite: false); | ||
| seed.Dispose(); |
0d49916 to
2fff782
Compare
Add three [Explicit] tests to RespRangeIndexTests reproducing the bf-tree 0.5.0 crash where dropping a CprSnapshot-recovered, MULTI-LEVEL tree aborts in bftree_drop with 'assertion failed: next_level.is_null()' at mini_page_op.rs:429 (panic on a bf-tree background thread, so --blame-crash cannot catch it): - RICheckpointRecoverThenDrop_Crashes: RESP-level via RI.CREATE DISK + RI.SET + SAVE + restart-recover; the recovered tree is dropped on server dispose. - BfTreeRecoverFromCprSnapshotThenDrop_Crashes: pure-native, calls a shared SeedRecoverDrop(records) helper with 378 records. - BfTreeRecoverFromCprSnapshotThenDrop_BelowThreshold_NoCrash: control with 377 records (one below the threshold) that drops cleanly. The crash is driven by the tree becoming multi-level (total data volume vs the small 64 KiB cache), not by record size. With tiny 4-byte key=value records, CACHESIZE 65536, MINRECORD 8, the threshold is a sharp 378 records (377 stays single-level), deterministic and identical on net8/net10 and Windows/Linux. Every Insert is asserted to succeed. All [Explicit] (crash the host by design; excluded from normal CI runs). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ccd2ce1 to
fe41bc6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds three
[Explicit]tests toRespRangeIndexTeststhat reproduce a crash in the nativebf-tree0.5.0 library: dropping a CprSnapshot-recovered, multi-level tree aborts the process inbftree_dropwithassertion failed: next_level.is_null()atmini_page_op.rs:429.The panic fires on a bf-tree internal background thread (a native
abort()), so it cannot be caught by--blame-crash/createdump. This is the root cause behind the intermittent crashes seen in the RangeIndex cluster-migration suite (the migration destination tree is a CprSnapshot-recovered tree).Tests
RICheckpointRecoverThenDrop_Crashes— RESP-level:RI.CREATE DISK+RI.SET+SAVE+ restart-recover; the recovered tree is dropped on server dispose.BfTreeRecoverFromCprSnapshotThenDrop_Crashes— pure-native, self-contained:BfTreeServicecreate -> snapshot -> recover -> drop.BfTreeRecoverFromCprSnapshotThenDrop_BelowThreshold_NoCrash— control: same sequence with few records (single-level) drops cleanly.Notes
Insertis asserted to succeed.[Explicit]so they are excluded from normal CI runs (they crash the host by design).