Skip to content

[runtime] append page-size shrink crash safety fix#3837

Merged
patrick-ogrady merged 14 commits into
mainfrom
codex/append-resize-crc-fallback
May 21, 2026
Merged

[runtime] append page-size shrink crash safety fix#3837
patrick-ogrady merged 14 commits into
mainfrom
codex/append-resize-crc-fallback

Conversation

@roberto-bayardo
Copy link
Copy Markdown
Collaborator

@roberto-bayardo roberto-bayardo commented May 20, 2026

Summary

Fix crash safety for runtime::utils::buffer::paged::Append::resize() when shrinking a committed page to a shorter partial page, without changing the on-disk CRC format.

The paged append layer stores two (len, crc) slots per page. Because len is not covered by the CRC, shrink rewrites must be careful not to let a torn footer write fabricate a larger authoritative length or destroy the only recoverable footer. This PR makes partial-page shrink use a staged footer update so recovery observes either the old committed page or the new shorter page.

Detailed Summary

  • Add a validated page+checksum read helper so shrink decisions use the checksum record recovered from disk, not potentially stale cached partial_page_state.
  • Protect both same-page partial shrink and full-page-to-partial shrink.
  • Stage the shorter checksum before publishing its length, then invalidate the old longer slot only after the shorter slot is durable.
  • Invalidate the old slot CRC-first so a torn invalidation cannot create a valid larger length.
  • Preserve recovery fallback behavior if a crash happens during any shrink footer phase.
  • Add a no-op fast path for resize(current_size).
  • Clarify paged-buffer documentation around partial-page shrink and ignored bytes past len.

This keeps the storage format unchanged. It hardens the torn-write states created by this implementation, but arbitrary length-field corruption remains a format-level limitation because len is still not included in the CRC.

Tests

  • Add regression coverage for interrupted shrink footer writes during:
    • new checksum staging
    • length publication
    • old-slot invalidation
  • Add coverage for shrink after recovery falls back to the non-authoritative checksum slot.
  • Add coverage for full-page-to-partial shrink, including interrupted footer staging.
  • Add coverage for resize() to the current size as a no-op.

Validated with:

cargo fmt -p commonware-runtime
git diff --check -- runtime/src/utils/buffer/paged/append.rs runtime/src/utils/buffer/paged/mod.rs
just test -p commonware-runtime utils::buffer::paged::append

Towards: https://github.com/commonwarexyz/monorepo/issues/1432

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 20, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
commonware-mcp ca7b55f May 21 2026, 07:43 AM

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 20, 2026

Deploying monorepo with  Cloudflare Pages  Cloudflare Pages

Latest commit: ca7b55f
Status: ✅  Deploy successful!
Preview URL: https://01eb29ad.monorepo-eu0.pages.dev
Branch Preview URL: https://codex-append-resize-crc-fall.monorepo-eu0.pages.dev

View logs

@roberto-bayardo roberto-bayardo requested a review from Copilot May 20, 2026 00:33
@roberto-bayardo
Copy link
Copy Markdown
Collaborator Author

bugbot run

@roberto-bayardo roberto-bayardo changed the title [storage+runtime] append page-size shrink crash safety fix [runtime] append page-size shrink crash safety fix May 20, 2026
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit e9edb5b. Configure here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 20, 2026

Benchmark results

Caution

2 benchmark(s) exceeded the regression threshold.

gandalf
Benchmark comparison table
Benchmark Baseline (main) Current Delta Threshold Status
qmdb::merkleize/variant=any::unordered::fixed::mmr keys=10000 ch=false sync=false 1.579 ms 2.568 ms +62.64% 10.00% ❌ FAIL regression
qmdb::merkleize/variant=current::ordered::fixed::mmb chunk=256 keys=10000 ch=true sync=false 3.060 ms 4.051 ms +32.37% 10.00% ❌ FAIL regression

Baseline commit(s): c0f7a042efdb

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes crash-safety for runtime’s paged append buffer when performing a same-page shrink of the final partial page by ensuring a shorter CRC can become authoritative without risking loss of the repaired prefix after an interrupted rewrite.

Changes:

  • Add a two-phase same-page shrink path in Append::resize() that stages the new shorter CRC in the alternate slot and then invalidates the old longer CRC.
  • Update module/docs to reflect same-page shrink semantics and recoverability guarantees.
  • Add regression tests for reopening at the shorter size and for surviving an injected interrupted CRC rewrite.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
runtime/src/utils/buffer/paged/mod.rs Updates paged-buffer documentation to describe same-page shrink CRC behavior and recoverability semantics.
runtime/src/utils/buffer/paged/append.rs Implements same-page shrink CRC rewrite logic and adds regression tests (including a fault-injecting blob wrapper).

Comment thread runtime/src/utils/buffer/paged/append.rs Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@roberto-bayardo roberto-bayardo force-pushed the codex/append-resize-crc-fallback branch 3 times, most recently from b0889d3 to 358c7c3 Compare May 20, 2026 04:04
@roberto-bayardo roberto-bayardo marked this pull request as ready for review May 20, 2026 04:05
@roberto-bayardo roberto-bayardo moved this to Ready for Review in Tracker May 20, 2026
@roberto-bayardo roberto-bayardo force-pushed the codex/append-resize-crc-fallback branch from 358c7c3 to fd03283 Compare May 20, 2026 04:31
@patrick-ogrady patrick-ogrady added this to the v2026.4.1 milestone May 20, 2026
@roberto-bayardo roberto-bayardo force-pushed the codex/append-resize-crc-fallback branch from 789729a to 6e156b7 Compare May 20, 2026 14:04
@roberto-bayardo roberto-bayardo requested a review from Copilot May 20, 2026 14:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread runtime/src/utils/buffer/paged/mod.rs Outdated
@roberto-bayardo roberto-bayardo force-pushed the codex/append-resize-crc-fallback branch from 82a86ec to 8768b5d Compare May 20, 2026 14:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

runtime/src/utils/buffer/paged/append.rs:985

  • new_physical_size is computed via (full_pages + 1) * physical_page_size / full_pages * physical_page_size without overflow checks. For very large target_size, this can wrap u64 and cause truncation/resize to an unintended (and much smaller) physical size. Please compute this with checked_add/checked_mul and return Error::OffsetOverflow on overflow.
        let new_physical_size = if partial_bytes > 0 {
            // We need full_pages + 1 physical pages to hold the partial data.
            // The partial page will be padded to full physical page size.
            (full_pages + 1) * physical_page_size
        } else {

Comment thread runtime/src/utils/buffer/paged/append.rs
Comment thread runtime/src/utils/buffer/paged/append.rs Outdated
@roberto-bayardo roberto-bayardo force-pushed the codex/append-resize-crc-fallback branch from c465da5 to 8768b5d Compare May 20, 2026 14:46
@roberto-bayardo roberto-bayardo requested a review from Copilot May 20, 2026 15:04
@roberto-bayardo roberto-bayardo force-pushed the codex/append-resize-crc-fallback branch from eec4d34 to 16a676d Compare May 20, 2026 15:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

runtime/src/utils/buffer/paged/append.rs:988

  • new_physical_size is computed with unchecked +/* on u64 ((full_pages + 1) * physical_page_size / full_pages * physical_page_size). If resize() is called with a very large target_size, these operations can overflow and wrap, causing the underlying blob to be resized to an unintended length. Please use checked_add/checked_mul and return Error::OffsetOverflow on overflow (similar to other offset calculations in this module).
        let new_physical_size = if partial_bytes > 0 {
            // We need full_pages + 1 physical pages to hold the partial data.
            // The partial page will be padded to full physical page size.
            (full_pages + 1) * physical_page_size
        } else {

// Update blob state and buffer based on the desired logical size. The page data is
// read with CRC validation, then durably rewritten below with a shorter CRC.
blob_guard.current_page = full_pages;
buf_guard.offset = full_pages * logical_page_size;
Comment thread runtime/src/utils/buffer/paged/append.rs Outdated
@roberto-bayardo roberto-bayardo force-pushed the codex/append-resize-crc-fallback branch 3 times, most recently from d0ba636 to b881c8a Compare May 20, 2026 20:46
@roberto-bayardo roberto-bayardo force-pushed the codex/append-resize-crc-fallback branch from b2e6e54 to 714471e Compare May 21, 2026 04:14
@patrick-ogrady patrick-ogrady merged commit 34f1f8e into main May 21, 2026
175 of 176 checks passed
@patrick-ogrady patrick-ogrady deleted the codex/append-resize-crc-fallback branch May 21, 2026 07:58
@github-project-automation github-project-automation Bot moved this from Ready for Review to Done in Tracker May 21, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 93.58752% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.77%. Comparing base (7d9df6f) to head (ca7b55f).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
runtime/src/utils/buffer/paged/append.rs 94.44% 27 Missing and 1 partial ⚠️
storage/src/journal/segmented/variable.rs 84.48% 8 Missing and 1 partial ⚠️
@@            Coverage Diff             @@
##             main    #3837      +/-   ##
==========================================
+ Coverage   95.73%   95.77%   +0.03%     
==========================================
  Files         480      486       +6     
  Lines      197098   200130    +3032     
  Branches     4808     4856      +48     
==========================================
+ Hits       188696   191669    +2973     
- Misses       6803     6831      +28     
- Partials     1599     1630      +31     
Files with missing lines Coverage Δ
runtime/src/utils/buffer/paged/mod.rs 97.11% <100.00%> (+0.13%) ⬆️
storage/src/journal/segmented/variable.rs 93.93% <84.48%> (+0.02%) ⬆️
runtime/src/utils/buffer/paged/append.rs 96.85% <94.44%> (-0.69%) ⬇️

... and 40 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7d9df6f...ca7b55f. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants