Skip to content

rsema1d: worker pools instead of adhoc routines #7212

@Wondertan

Description

@Wondertan

Currently rsema1d spawns goroutines ad hoc inside every fibre operation
(Upload / Download / Verify), Upload alone fans out four times:
paddedRowTree, deriveCoefficients, computeRLCVectorized,
paddedRLCTree. This substantially reduces latency for individual calls.
Under concurrent load, however, thousands of verifications on a fibre
server, or many simultaneous uploads inflate p95 tail latency through
goroutine churn, cache misses, and scheduler pressure.

The simplest first step is to preallocate the goroutines per operation
rather than spawning them on each call. That won't eliminate tail latency
at scale, but it's the right starting point.

The next step is to consolidate them into a single shared pool of
~GOMAXPROCS hashing routines covering every operation. Even that won't
fully resolve tail latency.

The final step is hash-request prioritization: serialize work and prefer
deriveCoefficients over computeRLCVectorized (and so on) when both
have queued requests.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Needs Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions