Concurrent server-side sampling requests are serialized end-to-end

### Enhancement

When a FastMCP server tool issues multiple `ctx.sample(...)` calls concurrently (e.g. via `asyncio.gather`), the client only ever processes one at a time: the underlying MCP `BaseSession._receive_loop` awaits each incoming request handler inline before reading the next message off the stream, so sampling responses come back strictly serially even though the server-side awaits are concurrent.

Concrete use case: I have a tool that fans out dozens of independent `ctx.sample` calls per invocation against an LLM deployment with provisioned capacity sitting idle. The work is embarrassingly parallel and I have token budget to burn, but wall-clock time scales linearly with the fan-out because every sample blocks the next one. A minimal reproducer (5 concurrent `ctx.sample` calls with a 0.5s server-side sleep in the sampling handler) shows peak in-flight = 1 and elapsed ≈ N × per-call latency instead of ≈ per-call latency.

I'd like FastMCP to support concurrent handling of sampling (and ideally other client-side) requests so that `asyncio.gather` over `ctx.sample` actually fans out to the LLM, with whatever bounding/back-pressure knob the maintainers think is appropriate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrent server-side sampling requests are serialized end-to-end #4006

Enhancement

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Concurrent server-side sampling requests are serialized end-to-end #4006

Description

Enhancement

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions