Skip to content

[storage/journal/fixed] fix crash recovery for clear_to_size/init_at_size#3809

Open
roberto-bayardo wants to merge 2 commits into
mainfrom
fix-init-at-size-crash-recovery
Open

[storage/journal/fixed] fix crash recovery for clear_to_size/init_at_size#3809
roberto-bayardo wants to merge 2 commits into
mainfrom
fix-init-at-size-crash-recovery

Conversation

@roberto-bayardo
Copy link
Copy Markdown
Collaborator

Summary

  • Fix crash recovery for init_at_size / clear_to_size: a crash after blob clearing but before tail blob creation previously recovered to 0..0, losing the target size
  • Introduce a write-ahead CLEAR_TARGET_KEY metadata entry so recovery completes the interrupted clear to the intended target instead of falling back to blob state
  • Add tests for crash at each stage: after intent sync, after blob removal, after tail creation, and with mid-section targets

Test plan

  • Existing crash recovery tests updated to assert correct target-preserving behavior
  • test_fixed_journal_clear_to_size_crash_after_intent_before_blobs — crash after intent, old blobs still present
  • test_fixed_journal_clear_to_size_crash_after_mid_section_intent_keeps_old_blob — mid-section target with re-open idempotency check

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 16, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
commonware-mcp c749134 May 21 2026, 03:03 PM

@roberto-bayardo roberto-bayardo added the bug Something isn't working label May 16, 2026
@roberto-bayardo roberto-bayardo moved this to Ready for Review in Tracker May 16, 2026
@roberto-bayardo roberto-bayardo requested a review from Copilot May 16, 2026 17:52
@roberto-bayardo roberto-bayardo force-pushed the fix-init-at-size-crash-recovery branch from 33f76f8 to 75bc79b Compare May 16, 2026 17:52
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 16, 2026

Deploying monorepo with  Cloudflare Pages  Cloudflare Pages

Latest commit: c749134
Status: ✅  Deploy successful!
Preview URL: https://e7ae18d7.monorepo-eu0.pages.dev
Branch Preview URL: https://fix-init-at-size-crash-recov.monorepo-eu0.pages.dev

View logs

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 16, 2026

Benchmark results

Tip

PASSED: No benchmark exceeded the regression threshold.

Benchmark comparison table
Benchmark Baseline (main) Current Delta Threshold Status
qmdb::merkleize/variant=any::unordered::fixed::mmr keys=10000 ch=false sync=false 1.623 ms 1.543 ms -4.91% 10.00% ✅ PASS
qmdb::merkleize/variant=current::ordered::fixed::mmb chunk=256 keys=10000 ch=true sync=false 3.075 ms 2.998 ms -2.49% 10.00% ✅ PASS

Baseline commit(s): edc5cced171c

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes crash recovery for init_at_size / clear_to_size in the fixed contiguous journal by introducing a write-ahead metadata “clear target” so recovery can complete an interrupted clear/reset to the intended size instead of falling back to blob-derived bounds.

Changes:

  • Add a write-ahead CLEAR_TARGET_KEY metadata entry and recovery path that completes an interrupted clear/reset on init().
  • Refactor metadata handling with shared helpers (parse_metadata_u64, stage_pruning_boundary_metadata, complete_clear_to_size) to ensure consistent ordering and crash-safety.
  • Extend crash-recovery tests to cover additional failure points and mid-section targets.

.unwrap();
metadata.put(PRUNING_BOUNDARY_KEY, 7u64.to_be_bytes().to_vec());
metadata.put(CLEAR_TARGET_KEY, 2u64.to_be_bytes().to_vec());
metadata.sync().await.unwrap();
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@roberto-bayardo roberto-bayardo force-pushed the fix-init-at-size-crash-recovery branch from 75bc79b to 7f0c9a9 Compare May 16, 2026 17:59
@patrick-ogrady
Copy link
Copy Markdown
Contributor

I think its ok to treat this as 0..0?

@roberto-bayardo
Copy link
Copy Markdown
Collaborator Author

roberto-bayardo commented May 16, 2026

I think its ok to treat this as 0..0?

is it always? Feels error prone to say "clear_to_size(42)" and then on restart you suddenly have a 0 size journal?

rewind() for example will never leave the journal in a state where it's prior to the attemped rewind point.

@roberto-bayardo
Copy link
Copy Markdown
Collaborator Author

I think its ok to treat this as 0..0?

is it always? Feels error prone to say "clear_to_size(42)" and then on restart you suddenly have a 0 size journal?

rewind() for example will never leave the journal in a state where it's prior to the attemped rewind point.

I suppose as long as we document this carefully I could live with it.

@roberto-bayardo roberto-bayardo marked this pull request as ready for review May 21, 2026 13:52
@roberto-bayardo roberto-bayardo force-pushed the fix-init-at-size-crash-recovery branch from 6851713 to c749134 Compare May 21, 2026 15:02
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 98.79518% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.77%. Comparing base (edc5cce) to head (c749134).

Files with missing lines Patch % Lines
storage/src/journal/contiguous/fixed.rs 98.79% 2 Missing ⚠️
@@           Coverage Diff            @@
##             main    #3809    +/-   ##
========================================
  Coverage   95.77%   95.77%            
========================================
  Files         486      486            
  Lines      200338   200433    +95     
  Branches     4858     4855     -3     
========================================
+ Hits       191872   191973   +101     
+ Misses       6834     6831     -3     
+ Partials     1632     1629     -3     
Files with missing lines Coverage Δ
storage/src/journal/contiguous/fixed.rs 97.45% <98.79%> (+0.35%) ⬆️

... and 7 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update edc5cce...c749134. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@patrick-ogrady patrick-ogrady added this to the v2026.6.0 milestone May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

Status: Ready for Review

Development

Successfully merging this pull request may close these issues.

3 participants