feat(fibre): bound upload memory budget by bytes by walldiss · Pull Request #7155 · celestiaorg/celestia-app

walldiss · 2026-04-21T14:32:02Z

Closes: https://linear.app/celestia/issue/PROTOCO-1557/fibre-blob-level-memory-admission-for-concurrent-uploads

Summary

Cap in-flight upload memory by bytes, not by a coarse blob count. ClientConfig.UploadMemoryBudget (default 512 MiB) is drawn down by each Upload() proportional to blob.UploadSize() and released on exit. Concurrent uploads share the budget: small blobs pack tightly, a max-size blob takes its actual share, and an oversized blob fails fast rather than deadlocking.

Why bytes, not blobs

Memory is what we're bounding. Blob sizes vary ~1 MiB to 128 MiB — a "number of blobs" cap is a coarse proxy that either over-admits small blobs (wasting memory headroom by counting them like big ones) or under-admits them (wasting concurrency). Byte accounting collapses this into one honest knob: "how much RAM am I willing to spend on in-flight upload buffers?"

Why this is separate from #7154

#7154 removed the old UploadConcurrency semaphore because its unit (one RPC) didn't match any real resource — wrong abstraction for failure isolation. The memory-admission concern it implicitly addressed is a different problem with a different right answer, and deserves its own PR and defaults discussion.

Change

New ClientConfig.UploadMemoryBudget (int64 bytes, default 512 MiB) with Validate() clamping.
Client.uploadBudget is a golang.org/x/sync/semaphore.Weighted (already an indirect dep; now direct).
Upload() reserves blob.UploadSize() at entry via Acquire(ctx, n) and Release(n) on exit. An oversized blob returns a clear error; ctx cancellation during wait returns ctx.Err().

Sizing guidance

The default (512 MiB) accommodates roughly 4 concurrent max-size (128 MiB) blobs on a validator-grade node. Tune against:

UploadMemoryBudget = max_memory_for_upload_buffers
                   ≳ 1 × max_blob_size     (so uploads aren't serialised)
                   ≲ safe_fraction_of_ram  (leave room for everything else)

An oversized blob (UploadSize > UploadMemoryBudget) returns an error at Upload() entry — a config that would otherwise deadlock is surfaced immediately.

Depends on

#7154 — builds on the ClientConfig / Client surface introduced there. Base branch is chore/fibre-upload-isolation.

Test plan

Builds and full fibre test suite passes
Bench: burst of small blobs should pack and finish; burst of large blobs should serialise; oversized blob returns error immediately

🤖 Generated with Claude Code

…layered timeouts Replace the global RPC semaphore with a per-peer circuit breaker and non-blocking fan-out. A dead validator holds only its own lane for one DialTimeout; subsequent blob uploads skip it at zero cost via the breaker. Throughput becomes insensitive to up to 1/3 peers down, matching Fibre's BFT liveness bound. - non-blocking fan-out: goroutines spawn up front; the circuit breaker check happens inside each goroutine, so a slow peer cannot delay other peers' goroutines from starting - per-peer circuit breaker (CircuitFailureThreshold / CircuitCooldown): dead peer's cost is paid once (at first observation) and amortized across all subsequent blob uploads; closed-state Allow hits a lock-free atomic fast path - layered timeouts: DialTimeout (3s) + RPCTimeout (15s) replace the single undifferentiated timeout so a black-holed peer is shed at dial time and healthy-but-slow peers get a generous RPC budget - best-effort post-quorum delivery: Upload returns at 2/3 but background goroutines keep delivering to remaining peers so downloaders have more validators to read from - circuit breaker state transitions emit a log line for operator visibility BREAKING CHANGE: ClientConfig.UploadConcurrency is removed. The upload path no longer exposes an RPC-count knob; concurrency is bounded by the validator set size per blob plus the caller's own Upload-rate discipline. Memory admission and peer-registry pruning are tracked as follow-ups. Closes: https://linear.app/celestia/issue/PROTOCO-1556/fibre-isolate-upload-failures-per-peer-to-preserve-23-quorum Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Introduce ClientConfig.UploadMemoryBudget (default 512 MiB) keyed on bytes rather than blob count. Each Upload() reserves blob.UploadSize() from a weighted semaphore at entry and releases on exit. Concurrent uploads share the budget, so small blobs pack efficiently and large blobs take their actual share instead of being accounted as a fixed "slot." Byte-level accounting matches the actual resource being bounded (upload buffers) instead of using blob count as a coarse proxy. An oversized blob — one whose UploadSize exceeds the total budget — fails fast rather than deadlocking on a reservation that can never be satisfied. This is a companion to #7154, which removed the old UploadConcurrency semaphore because it was the wrong abstraction for failure isolation. This PR adds the right abstraction for memory admission. - new ClientConfig.UploadMemoryBudget (int64 bytes, default 512 MiB) - Client.uploadBudget is a golang.org/x/sync/semaphore.Weighted - Upload() acquires blob.UploadSize() with ctx-cancellable Acquire; Releases on exit; rejects oversized blobs with a clear error Closes: https://linear.app/celestia/issue/PROTOCO-1557/fibre-blob-level-memory-admission-for-concurrent-uploads Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

walldiss force-pushed the chore/fibre-concurrent-blobs branch from 584fb23 to bb66365 Compare April 21, 2026 14:39

walldiss changed the title ~~feat(fibre): bound concurrent blob uploads by memory budget~~ feat(fibre): bound upload memory budget by bytes Apr 21, 2026

walldiss and others added 2 commits April 21, 2026 16:45

walldiss force-pushed the chore/fibre-upload-isolation branch from 54a7c4e to c184fa4 Compare April 21, 2026 14:45

walldiss force-pushed the chore/fibre-concurrent-blobs branch from bb66365 to b00a8c1 Compare April 21, 2026 14:47

walldiss force-pushed the chore/fibre-upload-isolation branch from c184fa4 to d833b63 Compare April 21, 2026 15:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(fibre): bound upload memory budget by bytes#7155

feat(fibre): bound upload memory budget by bytes#7155
walldiss wants to merge 3 commits intochore/fibre-upload-isolationfrom
chore/fibre-concurrent-blobs

walldiss commented Apr 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

walldiss commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why bytes, not blobs

Why this is separate from #7154

Change

Sizing guidance

Depends on

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

walldiss commented Apr 21, 2026 •

edited

Loading