[storage] Write large appends directly to blob#4015
Conversation
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
commonware-mcp | 5bb9d71 | Jun 11 2026, 04:08 PM |
Benchmark resultsTip ✅ PASSED: No benchmark exceeded the regression threshold. Benchmark comparison table
Baseline commit(s): |
There was a problem hiding this comment.
Pull request overview
This PR optimizes the storage/runtime append pipeline to avoid write-buffer ballooning on large appends by introducing an owned-buffer fast path that writes full pages directly to the underlying blob (while preserving page-cache behavior, CRC layout, and durability semantics), and then plumbing owned IoBuf usage through the fixed/variable journals with new coverage for the direct path.
Changes:
- Add
Append::append_owned(IoBuf)and route large borrowedappend(&[u8])calls through it to avoid buffering/copy amplification. - Update segmented + contiguous fixed/variable journals to pass owned
IoBuf(including interior slices) down to blob appends. - Add deterministic tests covering large/direct append behavior, offset correctness, replay, and physical byte-for-byte equivalence vs buffered writes.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| storage/src/journal/segmented/variable.rs | Switch variable segmented journal raw appends to IoBuf + add large-item direct-path test coverage. |
| storage/src/journal/segmented/fixed.rs | Switch fixed segmented journal raw appends to IoBuf and use append_owned for large buffers. |
| storage/src/journal/contiguous/variable.rs | Convert batch-encoded buffers to IoBuf and pass section slices as owned views to the underlying append path; add large batch test. |
| storage/src/journal/contiguous/fixed.rs | Convert fixed-item batch buffers to IoBuf and pass owned slices into section appends; add large batch test. |
| runtime/src/utils/buffer/tip.rs | Add Buffer::replace helper to publish a new tip buffer without copying. |
| runtime/src/utils/buffer/paged/append.rs | Implement the direct-to-blob large-append path (append_owned), refactor full-page physical encoding, update resize grow path, and add direct-path tests. |
Deploying monorepo with
|
| Latest commit: |
5bb9d71
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://86a73221.monorepo-eu0.pages.dev |
| Branch Preview URL: | https://roberto-skip-buffer-on-large.monorepo-eu0.pages.dev |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit f637600. Configure here.
187ee36 to
104a285
Compare
Plain append() now copies a too-large input once into an owned buffer and writes its whole pages directly to the blob, so the write buffer never grows meaningfully beyond capacity. Directly written pages populate the page cache exactly as a buffered flush would. The resize grow path reuses the owned zero buffer instead of borrowing it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Review fixes for the direct-path append: - Mutate no shared state before the final suspension point: the page-cache insert now happens after the blob write lock is acquired, so a future dropped while awaiting that lock leaves no trace (previously it left cache entries for data that never landed, which try_read_sync would later serve as wrong bytes once the logical size grew past them). - Replace the silent race fallback with an assertion: concurrent mutable operations are unsupported while a direct-path append is in flight (mirroring resize), and silently splicing a concurrent append's bytes into the middle of this call's range was worse than failing loudly. - Copy the sub-page suffix when seeding the tip so a section whose final append took the direct path does not pin the caller's entire batch allocation until its next append (or forever, for sealed sections). - Populate the page cache in write-buffer-sized chunks so the sync cache lock is not held across the entire bulk copy. New tests: byte-identical physical output of the direct vs buffered paths, variable-journal direct-path coverage (large single items with both top-up branches, and a multi-section batch exceeding the write buffer), and a regression check that the buffered suffix does not alias the input. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Cloned Append handles support concurrent reads, but mutable operations must be serialized by the caller; the direct path's boundary assert is an invariant check on that contract, not a per-method behavior to warn about. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
104a285 to
5bb9d71
Compare
Codecov Report❌ Patch coverage is
@@ Coverage Diff @@
## main #4015 +/- ##
==========================================
+ Coverage 95.21% 95.23% +0.01%
==========================================
Files 530 530
Lines 219533 219998 +465
Branches 5341 5348 +7
==========================================
+ Hits 209035 209507 +472
+ Misses 8691 8687 -4
+ Partials 1807 1804 -3
... and 10 files with indirect coverage changes Continue to review full report in Codecov by Harness.
🚀 New features to boost your workflow:
|

Write large appends directly to the blob
An append larger than the write buffer previously ballooned the buffer to the append's full size: the bytes were copied into the buffer, flushed, and copied again into the page cache. Now
Append::append_ownedwrites whole pages of an owned buffer directly to the blob as zero-copy slices (the current partial page is topped up and flushed through the regular protected-CRC path first, and the sub-page tail stays buffered), so the write buffer never grows meaningfully beyond capacity for any caller. The fixed and variable journals pass ownedIoBufs down, making batched appends (e.g. qmdb commits) zero-copy end to end. Page-cache population, on-disk format, and durability semantics are unchanged and pinned by tests (including byte-identical physical output vs the buffered path).The final commit hardens the path: no shared state is mutated before the last suspension point (a dropped future leaves no trace), concurrent mutable operations fail loudly instead of silently interleaving (mirroring
resize), the buffered tail is copied so it cannot pin the caller's batch allocation, and the page cache is populated in write-buffer-sized chunks so readers are never stalled behind a multi-MB copy.Benchmarks vs main: merkle::flush −5–8% at n=100k (n=10k control flat: the path only engages above the write buffer); qmdb::apply_batch −6 to −12% on the journal-I/O-heavy ancestor cases (uncomm_ancestor, comm_uncomm_chain, multi_uncomm), with the in-memory direct and comm_ancestor cases flat; prove/read/small-append/variable-journal benches all flat.