Skip to content

feat(fibre): bucketed row pool#7159

Open
Wondertan wants to merge 1 commit intomainfrom
feat/slab-pool-allocator-alt
Open

feat(fibre): bucketed row pool#7159
Wondertan wants to merge 1 commit intomainfrom
feat/slab-pool-allocator-alt

Conversation

@Wondertan
Copy link
Copy Markdown
Member

@Wondertan Wondertan commented Apr 22, 2026

Third and final iteration of the encoding memory layout for Fibre.

The 2nd iteration (#7091) rested on the intuition that freeing the rows for each validator as soon as Upload finished with them would keep peak memory lower than holding the full blob's rows through the tail-latency drain. Since Upload returns after 2/3 of validators have acked, carrying all 128 MiB through the remaining tail felt obviously wasteful.

In practice, this was a premature optimization that generated more complexity than it repaid. Per-validator releases are adversarial to the allocator: the rows that go free at any moment are a random subset of the blob, so the freed memory lands as fragmented holes rather than reusable slots. During implementation, it became clear that fragmentation is an issue with several rounds of optimization layered on top to compensate; however, they were only putting complex makeup on a pig.

In a sync review, @walldiss flagged the complexity of the slab allocator as an issue that reduces trust in it. We agreed to eliminate per-validator releases to reduce complexity. It was a great call that also confirmed the optimization was flawed in the end.

The 3rd iteration drops per-validator release entirely in favor of whole-batch pooling. It is significantly simpler, and more importantly, it behaves better under load: steady-state memory tracks the count of concurrent in-flight encodes rather than worker count × blob size. For example, 10 workers × 128 MiB blobs no longer pin ~10 GiB of work buffers; memory settles around whatever the network bottleneck is.

This steady state never emerged in the 2nd iteration because fragmented reuse forced fresh allocation for nearly every encode, and the allocator couldn't recycle a random scatter of freed rows into a contiguous batch-shaped request, so memory grew until every worker effectively held its own reservation.


Open in Devin Review

@Wondertan Wondertan self-assigned this Apr 22, 2026
@Wondertan Wondertan requested a review from a team as a code owner April 22, 2026 00:28
@Wondertan Wondertan requested review from ninabarbakadze and removed request for a team April 22, 2026 00:28
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

devin-ai-integration[bot]

This comment was marked as resolved.

Introduces fibre/internal/row, a bucketed allocator of fixed-shape row
batches used by the blob encode path and the rsema1d codec's work
buffers. Replaces the per-encode sync.Pool with explicit retention
(aged eviction, idle-grace drop) and mmap-backed regions above 1 MiB,
keeping steady-state RSS proportional to concurrent in-flight encodes
rather than worst-case per-worker reservation.

Allocations run without holding the pool lock so a fresh mmap doesn't
stall concurrent Gets/Puts behind a multi-ms syscall.

row.Assembler layers a K+N row view on top of the pool: original rows
alias input data zero-copy where possible, parity+head+tail come from
a single pooled batch released as one unit.

ProtocolParams.CodecWorkRows() exposes leopard-GF16's work-row count
so callers size the pool without pool code needing to know codec
internals.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Wondertan Wondertan force-pushed the feat/slab-pool-allocator-alt branch from 6fbf0b1 to e5cb881 Compare April 22, 2026 01:01
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 9 additional findings in Devin Review.

Open in Devin Review

Comment thread fibre/client_upload.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant