Skip to content

[storage] Write large appends directly to blob#4015

Open
roberto-bayardo wants to merge 7 commits into
mainfrom
roberto/skip-buffer-on-large-append
Open

[storage] Write large appends directly to blob#4015
roberto-bayardo wants to merge 7 commits into
mainfrom
roberto/skip-buffer-on-large-append

Conversation

@roberto-bayardo

@roberto-bayardo roberto-bayardo commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Write large appends directly to the blob

An append larger than the write buffer previously ballooned the buffer to the append's full size: the bytes were copied into the buffer, flushed, and copied again into the page cache. Now Append::append_owned writes whole pages of an owned buffer directly to the blob as zero-copy slices (the current partial page is topped up and flushed through the regular protected-CRC path first, and the sub-page tail stays buffered), so the write buffer never grows meaningfully beyond capacity for any caller. The fixed and variable journals pass owned IoBufs down, making batched appends (e.g. qmdb commits) zero-copy end to end. Page-cache population, on-disk format, and durability semantics are unchanged and pinned by tests (including byte-identical physical output vs the buffered path).

The final commit hardens the path: no shared state is mutated before the last suspension point (a dropped future leaves no trace), concurrent mutable operations fail loudly instead of silently interleaving (mirroring resize), the buffered tail is copied so it cannot pin the caller's batch allocation, and the page cache is populated in write-buffer-sized chunks so readers are never stalled behind a multi-MB copy.

Benchmarks vs main: merkle::flush −5–8% at n=100k (n=10k control flat: the path only engages above the write buffer); qmdb::apply_batch −6 to −12% on the journal-I/O-heavy ancestor cases (uncomm_ancestor, comm_uncomm_chain, multi_uncomm), with the in-memory direct and comm_ancestor cases flat; prove/read/small-append/variable-journal benches all flat.

@roberto-bayardo roberto-bayardo requested a review from Copilot June 10, 2026 21:01
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 10, 2026

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
commonware-mcp 5bb9d71 Jun 11 2026, 04:08 PM

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown

Benchmark results

Tip

PASSED: No benchmark exceeded the regression threshold.

Benchmark comparison table
Benchmark Baseline (main) Current Delta Threshold Status
qmdb::merkleize/variant=any::unordered::fixed::mmr keys=10000 ch=false sync=false 1.384 ms 1.407 ms +1.66% 10.00% ✅ PASS
qmdb::merkleize/variant=current::ordered::fixed::mmb chunk=256 keys=10000 ch=true sync=false 2.490 ms 2.492 ms +0.06% 10.00% ✅ PASS

Baseline commit(s): bce243dab2a3

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the storage/runtime append pipeline to avoid write-buffer ballooning on large appends by introducing an owned-buffer fast path that writes full pages directly to the underlying blob (while preserving page-cache behavior, CRC layout, and durability semantics), and then plumbing owned IoBuf usage through the fixed/variable journals with new coverage for the direct path.

Changes:

  • Add Append::append_owned(IoBuf) and route large borrowed append(&[u8]) calls through it to avoid buffering/copy amplification.
  • Update segmented + contiguous fixed/variable journals to pass owned IoBuf (including interior slices) down to blob appends.
  • Add deterministic tests covering large/direct append behavior, offset correctness, replay, and physical byte-for-byte equivalence vs buffered writes.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
storage/src/journal/segmented/variable.rs Switch variable segmented journal raw appends to IoBuf + add large-item direct-path test coverage.
storage/src/journal/segmented/fixed.rs Switch fixed segmented journal raw appends to IoBuf and use append_owned for large buffers.
storage/src/journal/contiguous/variable.rs Convert batch-encoded buffers to IoBuf and pass section slices as owned views to the underlying append path; add large batch test.
storage/src/journal/contiguous/fixed.rs Convert fixed-item batch buffers to IoBuf and pass owned slices into section appends; add large batch test.
runtime/src/utils/buffer/tip.rs Add Buffer::replace helper to publish a new tip buffer without copying.
runtime/src/utils/buffer/paged/append.rs Implement the direct-to-blob large-append path (append_owned), refactor full-page physical encoding, update resize grow path, and add direct-path tests.

Comment thread runtime/src/utils/buffer/paged/append.rs Outdated
@roberto-bayardo roberto-bayardo marked this pull request as ready for review June 10, 2026 21:16
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 10, 2026

Copy link
Copy Markdown

Deploying monorepo with  Cloudflare Pages  Cloudflare Pages

Latest commit: 5bb9d71
Status: ✅  Deploy successful!
Preview URL: https://86a73221.monorepo-eu0.pages.dev
Branch Preview URL: https://roberto-skip-buffer-on-large.monorepo-eu0.pages.dev

View logs

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f637600. Configure here.

Comment thread storage/src/journal/segmented/fixed.rs
@roberto-bayardo roberto-bayardo force-pushed the roberto/skip-buffer-on-large-append branch 2 times, most recently from 187ee36 to 104a285 Compare June 11, 2026 14:40
@roberto-bayardo roberto-bayardo moved this to Ready for Review in Tracker Jun 11, 2026
roberto-bayardo and others added 7 commits June 11, 2026 08:55
Plain append() now copies a too-large input once into an owned buffer and
writes its whole pages directly to the blob, so the write buffer never grows
meaningfully beyond capacity. Directly written pages populate the page cache
exactly as a buffered flush would. The resize grow path reuses the owned
zero buffer instead of borrowing it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Review fixes for the direct-path append:

- Mutate no shared state before the final suspension point: the page-cache
  insert now happens after the blob write lock is acquired, so a future
  dropped while awaiting that lock leaves no trace (previously it left
  cache entries for data that never landed, which try_read_sync would
  later serve as wrong bytes once the logical size grew past them).
- Replace the silent race fallback with an assertion: concurrent mutable
  operations are unsupported while a direct-path append is in flight
  (mirroring resize), and silently splicing a concurrent append's bytes
  into the middle of this call's range was worse than failing loudly.
- Copy the sub-page suffix when seeding the tip so a section whose final
  append took the direct path does not pin the caller's entire batch
  allocation until its next append (or forever, for sealed sections).
- Populate the page cache in write-buffer-sized chunks so the sync cache
  lock is not held across the entire bulk copy.

New tests: byte-identical physical output of the direct vs buffered paths,
variable-journal direct-path coverage (large single items with both top-up
branches, and a multi-section batch exceeding the write buffer), and a
regression check that the buffered suffix does not alias the input.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Cloned Append handles support concurrent reads, but mutable operations
must be serialized by the caller; the direct path's boundary assert is an
invariant check on that contract, not a per-method behavior to warn about.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@roberto-bayardo roberto-bayardo force-pushed the roberto/skip-buffer-on-large-append branch from 104a285 to 5bb9d71 Compare June 11, 2026 16:02
@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 99.41292% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.23%. Comparing base (bce243d) to head (5bb9d71).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
storage/src/journal/segmented/fixed.rs 87.50% 1 Missing and 1 partial ⚠️
runtime/src/utils/buffer/paged/append.rs 99.68% 0 Missing and 1 partial ⚠️
@@            Coverage Diff             @@
##             main    #4015      +/-   ##
==========================================
+ Coverage   95.21%   95.23%   +0.01%     
==========================================
  Files         530      530              
  Lines      219533   219998     +465     
  Branches     5341     5348       +7     
==========================================
+ Hits       209035   209507     +472     
+ Misses       8691     8687       -4     
+ Partials     1807     1804       -3     
Files with missing lines Coverage Δ
runtime/src/utils/buffer/tip.rs 96.83% <100.00%> (+0.07%) ⬆️
storage/src/journal/contiguous/fixed.rs 98.31% <100.00%> (+0.02%) ⬆️
storage/src/journal/contiguous/variable.rs 98.90% <100.00%> (+0.01%) ⬆️
storage/src/journal/segmented/variable.rs 93.91% <100.00%> (+0.19%) ⬆️
storage/src/qmdb/current/mod.rs 98.98% <100.00%> (+<0.01%) ⬆️
runtime/src/utils/buffer/paged/append.rs 98.17% <99.68%> (+0.17%) ⬆️
storage/src/journal/segmented/fixed.rs 97.71% <87.50%> (+0.11%) ⬆️

... and 10 files with indirect coverage changes


Continue to review full report in Codecov by Harness.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bce243d...5bb9d71. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Ready for Review

Development

Successfully merging this pull request may close these issues.

2 participants