[Bug] Sampling loop crashes immediately when LLM returns text instead of calling final_response

## Description

`run.py:685` raises `RuntimeError` immediately when structured output is expected (`result_type` is set) but the LLM returns a text response instead of calling the `final_response` tool. There is no retry.

This contrasts with validation errors on `final_response` (lines 657-679), which append an error message to history and retry — giving the LLM a chance to correct itself.

## Current behavior

```
sample_impl (run.py:614)
  → LLM returns text (no tool call)
  → step.is_tool_use = False
  → line 685: raise RuntimeError("Expected structured output of type X, 
      but the LLM returned a text response instead of calling the final_response tool.")
```

No retry. Immediate crash. The user sees a `RuntimeError` with no recovery.

## When this happens

- Gemini occasionally ignores `tool_choice="required"` and returns text anyway
- More common with preview models (gemini-3-flash-preview, gemini-3.1-pro-preview)
- Can happen on the 2nd+ iteration of the tool loop when the LLM has gathered information from tools and tries to "answer" with text instead of calling `final_response`
- Note: #3011 fixed `tool_choice` resetting to `None` after the first iteration, which reduced but did not eliminate this issue — some models still return text even with `tool_choice="required"`

## Expected behavior

The sampling loop should retry with an explicit nudge message (similar to how validation errors are handled), capped at a small number of retries (e.g. 2-3) before raising.

Suggested approach:
```python
# At line 682-689, instead of immediate RuntimeError:
if not step.is_tool_use:
    if result_type is not None and result_type is not str:
        text_response_retries += 1
        if text_response_retries > max_text_retries:  # e.g. 3
            raise RuntimeError(...)
        # Append a nudge to history and retry
        step.history.append(SamplingMessage(
            role="user",
            content=TextContent(
                type="text",
                text="You must call the `final_response` tool to provide your answer. "
                     "Do not respond with text — use the tool.",
            ),
        ))
        current_messages = step.history
        continue
```

## Reproduction

```python
import asyncio
from fastmcp import Client, FastMCP, Context
from mcp.types import CreateMessageResult, TextContent
from pydantic import BaseModel

class MyResult(BaseModel):
    answer: str

# Handler that returns text instead of a tool call
def bad_handler(messages, params, context):
    return CreateMessageResult(
        role="assistant",
        model="test",
        content=TextContent(type="text", text="Here is my answer"),
    )

mcp = FastMCP(sampling_handler=bad_handler)

@mcp.tool
async def test_tool(ctx: Context) -> str:
    result = await ctx.sample(messages="Say hello", result_type=MyResult)
    return result.result.answer

async def main():
    async with Client(mcp) as client:
        await client.call_tool("test_tool", {})  # RuntimeError — no retry

asyncio.run(main())
```

## Version

fastmcp 3.2.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Sampling loop crashes immediately when LLM returns text instead of calling final_response #3847

Description

Current behavior

When this happens

Expected behavior

Reproduction

Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Sampling loop crashes immediately when LLM returns text instead of calling final_response #3847

Description

Description

Current behavior

When this happens

Expected behavior

Reproduction

Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions