Enhancement
When a FastMCP server tool issues multiple ctx.sample(...) calls concurrently (e.g. via asyncio.gather), the client only ever processes one at a time: the underlying MCP BaseSession._receive_loop awaits each incoming request handler inline before reading the next message off the stream, so sampling responses come back strictly serially even though the server-side awaits are concurrent.
Concrete use case: I have a tool that fans out dozens of independent ctx.sample calls per invocation against an LLM deployment with provisioned capacity sitting idle. The work is embarrassingly parallel and I have token budget to burn, but wall-clock time scales linearly with the fan-out because every sample blocks the next one. A minimal reproducer (5 concurrent ctx.sample calls with a 0.5s server-side sleep in the sampling handler) shows peak in-flight = 1 and elapsed ≈ N × per-call latency instead of ≈ per-call latency.
I'd like FastMCP to support concurrent handling of sampling (and ideally other client-side) requests so that asyncio.gather over ctx.sample actually fans out to the LLM, with whatever bounding/back-pressure knob the maintainers think is appropriate.
Enhancement
When a FastMCP server tool issues multiple
ctx.sample(...)calls concurrently (e.g. viaasyncio.gather), the client only ever processes one at a time: the underlying MCPBaseSession._receive_loopawaits each incoming request handler inline before reading the next message off the stream, so sampling responses come back strictly serially even though the server-side awaits are concurrent.Concrete use case: I have a tool that fans out dozens of independent
ctx.samplecalls per invocation against an LLM deployment with provisioned capacity sitting idle. The work is embarrassingly parallel and I have token budget to burn, but wall-clock time scales linearly with the fan-out because every sample blocks the next one. A minimal reproducer (5 concurrentctx.samplecalls with a 0.5s server-side sleep in the sampling handler) shows peak in-flight = 1 and elapsed ≈ N × per-call latency instead of ≈ per-call latency.I'd like FastMCP to support concurrent handling of sampling (and ideally other client-side) requests so that
asyncio.gatheroverctx.sampleactually fans out to the LLM, with whatever bounding/back-pressure knob the maintainers think is appropriate.