Description
When result_type is set and the LLM calls final_response with invalid data, the sampling loop retries by appending a validation error to history (lines 657-679 in run.py). However, the only limit on these retries is the shared max_iterations = 100 constant that also governs the agentic tool loop.
This means a model that consistently produces invalid JSON for a structured output schema will burn through up to 100 LLM calls before failing — each one costing tokens and time.
Current behavior
# run.py:614
for _iteration in range(max_iterations): # max_iterations = 100
step = await sample_step_impl(...)
for tool_call in step.tool_calls:
if tool_call.name == "final_response":
try:
validated_result = type_adapter.validate_python(input_data)
return SamplingResult(...) # success
except ValidationError as e:
# Append error to history and... just loop again
step.history.append(...) # "Validation error: ... Please try again"
current_messages = step.history
# No separate counter for consecutive validation failures
# Loop continues up to 100 iterations
Expected behavior
There should be a separate, smaller cap for consecutive final_response validation failures (e.g. 3-5). If the LLM fails validation N times in a row, it's unlikely to self-correct and the loop should raise early rather than burning through 95+ more iterations.
This is distinct from the overall max_iterations which appropriately limits the total agentic tool loop (where the LLM may legitimately call many tools before finishing).
Suggested approach:
max_validation_retries = 3 # or make configurable
consecutive_validation_failures = 0
for _iteration in range(max_iterations):
...
if tool_call.name == "final_response":
try:
validated_result = type_adapter.validate_python(input_data)
return SamplingResult(...)
except ValidationError as e:
consecutive_validation_failures += 1
if consecutive_validation_failures > max_validation_retries:
raise RuntimeError(
f"Structured output validation failed {consecutive_validation_failures} "
f"times consecutively for type {result_type.__name__}: {e}"
) from e
# append error to history and retry...
# Reset counter when the LLM does something other than fail validation
# (e.g. calls a different tool successfully)
if tool_call.name != "final_response":
consecutive_validation_failures = 0
Impact
Without this cap:
- A complex Pydantic schema that the model can't satisfy will silently burn 100 API calls
- Each retry includes the full conversation history, so token usage grows quadratically
- The eventual error after 100 iterations is "Sampling exceeded maximum iterations (100)" which doesn't indicate that validation was the root cause
Version
fastmcp 3.2.3
Description
When
result_typeis set and the LLM callsfinal_responsewith invalid data, the sampling loop retries by appending a validation error to history (lines 657-679 inrun.py). However, the only limit on these retries is the sharedmax_iterations = 100constant that also governs the agentic tool loop.This means a model that consistently produces invalid JSON for a structured output schema will burn through up to 100 LLM calls before failing — each one costing tokens and time.
Current behavior
Expected behavior
There should be a separate, smaller cap for consecutive
final_responsevalidation failures (e.g. 3-5). If the LLM fails validation N times in a row, it's unlikely to self-correct and the loop should raise early rather than burning through 95+ more iterations.This is distinct from the overall
max_iterationswhich appropriately limits the total agentic tool loop (where the LLM may legitimately call many tools before finishing).Suggested approach:
Impact
Without this cap:
Version
fastmcp 3.2.3