You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
run.py:685 raises RuntimeError immediately when structured output is expected (result_type is set) but the LLM returns a text response instead of calling the final_response tool. There is no retry.
This contrasts with validation errors on final_response (lines 657-679), which append an error message to history and retry — giving the LLM a chance to correct itself.
Current behavior
sample_impl (run.py:614)
→ LLM returns text (no tool call)
→ step.is_tool_use = False
→ line 685: raise RuntimeError("Expected structured output of type X,
but the LLM returned a text response instead of calling the final_response tool.")
No retry. Immediate crash. The user sees a RuntimeError with no recovery.
When this happens
Gemini occasionally ignores tool_choice="required" and returns text anyway
More common with preview models (gemini-3-flash-preview, gemini-3.1-pro-preview)
Can happen on the 2nd+ iteration of the tool loop when the LLM has gathered information from tools and tries to "answer" with text instead of calling final_response
Note: Sampling with Tool Calls resets tool_choice #3011 fixed tool_choice resetting to None after the first iteration, which reduced but did not eliminate this issue — some models still return text even with tool_choice="required"
Expected behavior
The sampling loop should retry with an explicit nudge message (similar to how validation errors are handled), capped at a small number of retries (e.g. 2-3) before raising.
Suggested approach:
# At line 682-689, instead of immediate RuntimeError:ifnotstep.is_tool_use:
ifresult_typeisnotNoneandresult_typeisnotstr:
text_response_retries+=1iftext_response_retries>max_text_retries: # e.g. 3raiseRuntimeError(...)
# Append a nudge to history and retrystep.history.append(SamplingMessage(
role="user",
content=TextContent(
type="text",
text="You must call the `final_response` tool to provide your answer. ""Do not respond with text — use the tool.",
),
))
current_messages=step.historycontinue
Reproduction
importasynciofromfastmcpimportClient, FastMCP, Contextfrommcp.typesimportCreateMessageResult, TextContentfrompydanticimportBaseModelclassMyResult(BaseModel):
answer: str# Handler that returns text instead of a tool calldefbad_handler(messages, params, context):
returnCreateMessageResult(
role="assistant",
model="test",
content=TextContent(type="text", text="Here is my answer"),
)
mcp=FastMCP(sampling_handler=bad_handler)
@mcp.toolasyncdeftest_tool(ctx: Context) ->str:
result=awaitctx.sample(messages="Say hello", result_type=MyResult)
returnresult.result.answerasyncdefmain():
asyncwithClient(mcp) asclient:
awaitclient.call_tool("test_tool", {}) # RuntimeError — no retryasyncio.run(main())
Description
run.py:685raisesRuntimeErrorimmediately when structured output is expected (result_typeis set) but the LLM returns a text response instead of calling thefinal_responsetool. There is no retry.This contrasts with validation errors on
final_response(lines 657-679), which append an error message to history and retry — giving the LLM a chance to correct itself.Current behavior
No retry. Immediate crash. The user sees a
RuntimeErrorwith no recovery.When this happens
tool_choice="required"and returns text anywayfinal_responsetool_choiceresetting toNoneafter the first iteration, which reduced but did not eliminate this issue — some models still return text even withtool_choice="required"Expected behavior
The sampling loop should retry with an explicit nudge message (similar to how validation errors are handled), capped at a small number of retries (e.g. 2-3) before raising.
Suggested approach:
Reproduction
Version
fastmcp 3.2.3