Skip to content

[Bug] Sampling loop: no cap on consecutive final_response validation retries #3848

@strawgate

Description

@strawgate

Description

When result_type is set and the LLM calls final_response with invalid data, the sampling loop retries by appending a validation error to history (lines 657-679 in run.py). However, the only limit on these retries is the shared max_iterations = 100 constant that also governs the agentic tool loop.

This means a model that consistently produces invalid JSON for a structured output schema will burn through up to 100 LLM calls before failing — each one costing tokens and time.

Current behavior

# run.py:614
for _iteration in range(max_iterations):  # max_iterations = 100
    step = await sample_step_impl(...)
    
    for tool_call in step.tool_calls:
        if tool_call.name == "final_response":
            try:
                validated_result = type_adapter.validate_python(input_data)
                return SamplingResult(...)  # success
            except ValidationError as e:
                # Append error to history and... just loop again
                step.history.append(...)  # "Validation error: ... Please try again"
    
    current_messages = step.history
    # No separate counter for consecutive validation failures
    # Loop continues up to 100 iterations

Expected behavior

There should be a separate, smaller cap for consecutive final_response validation failures (e.g. 3-5). If the LLM fails validation N times in a row, it's unlikely to self-correct and the loop should raise early rather than burning through 95+ more iterations.

This is distinct from the overall max_iterations which appropriately limits the total agentic tool loop (where the LLM may legitimately call many tools before finishing).

Suggested approach:

max_validation_retries = 3  # or make configurable
consecutive_validation_failures = 0

for _iteration in range(max_iterations):
    ...
    if tool_call.name == "final_response":
        try:
            validated_result = type_adapter.validate_python(input_data)
            return SamplingResult(...)
        except ValidationError as e:
            consecutive_validation_failures += 1
            if consecutive_validation_failures > max_validation_retries:
                raise RuntimeError(
                    f"Structured output validation failed {consecutive_validation_failures} "
                    f"times consecutively for type {result_type.__name__}: {e}"
                ) from e
            # append error to history and retry...
    
    # Reset counter when the LLM does something other than fail validation
    # (e.g. calls a different tool successfully)
    if tool_call.name != "final_response":
        consecutive_validation_failures = 0

Impact

Without this cap:

  • A complex Pydantic schema that the model can't satisfy will silently burn 100 API calls
  • Each retry includes the full conversation history, so token usage grows quadratically
  • The eventual error after 100 iterations is "Sampling exceeded maximum iterations (100)" which doesn't indicate that validation was the root cause

Version

fastmcp 3.2.3

Metadata

Metadata

Assignees

Labels

too-beautifulBeautifully verbose and unedited LLM output. With no need to condense before triage.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions