Skip to content

[Bug] Sampling loop crashes immediately when LLM returns text instead of calling final_response #3847

@strawgate

Description

@strawgate

Description

run.py:685 raises RuntimeError immediately when structured output is expected (result_type is set) but the LLM returns a text response instead of calling the final_response tool. There is no retry.

This contrasts with validation errors on final_response (lines 657-679), which append an error message to history and retry — giving the LLM a chance to correct itself.

Current behavior

sample_impl (run.py:614)
  → LLM returns text (no tool call)
  → step.is_tool_use = False
  → line 685: raise RuntimeError("Expected structured output of type X, 
      but the LLM returned a text response instead of calling the final_response tool.")

No retry. Immediate crash. The user sees a RuntimeError with no recovery.

When this happens

  • Gemini occasionally ignores tool_choice="required" and returns text anyway
  • More common with preview models (gemini-3-flash-preview, gemini-3.1-pro-preview)
  • Can happen on the 2nd+ iteration of the tool loop when the LLM has gathered information from tools and tries to "answer" with text instead of calling final_response
  • Note: Sampling with Tool Calls resets tool_choice #3011 fixed tool_choice resetting to None after the first iteration, which reduced but did not eliminate this issue — some models still return text even with tool_choice="required"

Expected behavior

The sampling loop should retry with an explicit nudge message (similar to how validation errors are handled), capped at a small number of retries (e.g. 2-3) before raising.

Suggested approach:

# At line 682-689, instead of immediate RuntimeError:
if not step.is_tool_use:
    if result_type is not None and result_type is not str:
        text_response_retries += 1
        if text_response_retries > max_text_retries:  # e.g. 3
            raise RuntimeError(...)
        # Append a nudge to history and retry
        step.history.append(SamplingMessage(
            role="user",
            content=TextContent(
                type="text",
                text="You must call the `final_response` tool to provide your answer. "
                     "Do not respond with text — use the tool.",
            ),
        ))
        current_messages = step.history
        continue

Reproduction

import asyncio
from fastmcp import Client, FastMCP, Context
from mcp.types import CreateMessageResult, TextContent
from pydantic import BaseModel

class MyResult(BaseModel):
    answer: str

# Handler that returns text instead of a tool call
def bad_handler(messages, params, context):
    return CreateMessageResult(
        role="assistant",
        model="test",
        content=TextContent(type="text", text="Here is my answer"),
    )

mcp = FastMCP(sampling_handler=bad_handler)

@mcp.tool
async def test_tool(ctx: Context) -> str:
    result = await ctx.sample(messages="Say hello", result_type=MyResult)
    return result.result.answer

async def main():
    async with Client(mcp) as client:
        await client.call_tool("test_tool", {})  # RuntimeError — no retry

asyncio.run(main())

Version

fastmcp 3.2.3

Metadata

Metadata

Assignees

Labels

bugSomething isn't working. Reports of errors, unexpected behavior, or broken functionality.serverRelated to FastMCP server implementation or server-side functionality.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions