rsema1d: worker pools instead of adhoc routines

  Currently rsema1d spawns goroutines ad hoc inside every fibre operation
  (Upload / Download / Verify), Upload alone fans out four times:
  `paddedRowTree`, `deriveCoefficients`, `computeRLCVectorized`,
  `paddedRLCTree`. This substantially reduces latency for individual calls.
  Under concurrent load, however, thousands of verifications on a fibre
  server, or many simultaneous uploads inflate p95 tail latency through
  goroutine churn, cache misses, and scheduler pressure.

  The simplest first step is to preallocate the goroutines per operation
  rather than spawning them on each call. That won't eliminate tail latency
  at scale, but it's the right starting point.

  The next step is to consolidate them into a single shared pool of
  ~GOMAXPROCS hashing routines covering every operation. Even that won't
  fully resolve tail latency.

  The final step is hash-request prioritization: serialize work and prefer
  `deriveCoefficients` over `computeRLCVectorized` (and so on) when both
  have queued requests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rsema1d: worker pools instead of adhoc routines #7212

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

rsema1d: worker pools instead of adhoc routines #7212

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions