Status: RFI complete, implementation planning ready. GitHub issue: #2309 Linear: RMN-257
ZeroClaw currently parses text/tool calls and token usage across providers, but it does not carry a
normalized stop reason into ChatResponse, and there is no deterministic continuation loop for
max_tokens truncation. This RFI defines a provider mapping model, a continuation FSM, partial
tool-call recovery policy, and observability/testing requirements.
src/providers/traits.rsChatResponsehas no stop-reason field.- Provider adapters parse text/tool-calls/usage, but stop reason fields are mostly discarded.
src/agent/loop_.rsfinalizes response if no parsed tool calls are present.- Existing parser in
src/agent/loop_/parsing.rsalready handles many malformed/truncated tool-call formats safely (no panic), but this is parsing recovery, not continuation policy.
- When a provider truncates output due to max token cap, the loop lacks a dedicated continuation path. Result: partial responses can be returned silently.
enum NormalizedStopReason {
EndTurn,
ToolCall,
MaxTokens,
ContextWindowExceeded,
SafetyBlocked,
Cancelled,
Unknown(String),
}Add stop-reason payload to provider response contract:
pub struct ChatResponse {
pub text: Option<String>,
pub tool_calls: Vec<ToolCall>,
pub usage: Option<TokenUsage>,
pub reasoning_content: Option<String>,
pub quota_metadata: Option<QuotaMetadata>,
pub stop_reason: Option<NormalizedStopReason>,
pub raw_stop_reason: Option<String>,
}raw_stop_reason preserves provider-native values for diagnostics and future mapping updates.
This table defines implementation targets for active provider families in ZeroClaw.
| Provider family | Native field | Native values | Normalized |
|---|---|---|---|
| OpenAI / OpenRouter / OpenAI-compatible chat | finish_reason |
stop |
EndTurn |
| OpenAI / OpenRouter / OpenAI-compatible chat | finish_reason |
tool_calls, function_call |
ToolCall |
| OpenAI / OpenRouter / OpenAI-compatible chat | finish_reason |
length |
MaxTokens |
| OpenAI / OpenRouter / OpenAI-compatible chat | finish_reason |
content_filter |
SafetyBlocked |
| Anthropic messages | stop_reason |
end_turn, stop_sequence |
EndTurn |
| Anthropic messages | stop_reason |
tool_use |
ToolCall |
| Anthropic messages | stop_reason |
max_tokens |
MaxTokens |
| Anthropic messages | stop_reason |
model_context_window_exceeded |
ContextWindowExceeded |
| Gemini generateContent | finishReason |
STOP |
EndTurn |
| Gemini generateContent | finishReason |
MAX_TOKENS |
MaxTokens |
| Gemini generateContent | finishReason |
SAFETY, RECITATION |
SafetyBlocked |
| Bedrock Converse | stopReason |
end_turn |
EndTurn |
| Bedrock Converse | stopReason |
tool_use |
ToolCall |
| Bedrock Converse | stopReason |
max_tokens |
MaxTokens |
| Bedrock Converse | stopReason |
guardrail_intervened |
SafetyBlocked |
Notes:
- Unknown values map to
Unknown(raw)and must be logged once per provider/model combination. - Mapping must be unit-tested against fixture payloads for each provider adapter.
- Continue only when stop reason indicates output truncation.
- Bound retries and total output growth.
- Preserve tool-call correctness (never execute partial JSON).
stateDiagram-v2
[*] --> Request
Request --> EvaluateStop: provider_response
EvaluateStop --> Complete: EndTurn
EvaluateStop --> ExecuteTools: ToolCall
EvaluateStop --> ContinuePending: MaxTokens
EvaluateStop --> Abort: SafetyBlocked/ContextWindowExceeded/UnknownFatal
ContinuePending --> RequestContinuation: under_limits
RequestContinuation --> EvaluateStop: provider_response
ContinuePending --> AbortPartial: retry_limit_or_budget_exceeded
AbortPartial --> Complete: return_partial_with_notice
ExecuteTools --> Request: tool_results_appended
max_continuations_per_turn = 3max_total_completion_tokens_per_turn = 4 * initial_max_tokens(configurable)max_total_output_chars_per_turn = 120_000(safety cap)
- Never execute tool calls when parsed payload is incomplete/ambiguous.
- If
MaxTokensand parser detects malformed/partial tool-call body:- request deterministic re-emission of the tool call payload only.
- keep attempt budget separate (
max_tool_repair_attempts = 1).
- If repair fails, degrade safely:
- return a partial response with explicit truncation notice.
- emit structured event for operator diagnosis.
Use a strict system-side continuation hint:
Previous response was truncated by token limit.
Continue exactly from where you left off.
If you intended a tool call, emit one complete tool call payload only.
Do not repeat already-sent text.
Emit structured events per turn:
stop_reason_observed- provider, model, normalized reason, raw reason, turn id, iteration.
continuation_attempt- attempt index, cumulative output tokens/chars, budget remaining.
continuation_terminated- terminal reason (
completed,retry_limit,budget_exhausted,safety_blocked).
- terminal reason (
tool_payload_repair- parse issue type, repair attempted, repair success/failure.
Metrics:
- counter: continuations triggered by provider/model.
- counter: truncation exits without continuation (guardrail/budget cases).
- histogram: continuation attempts per turn.
- histogram: end-to-end turn latency for continued turns.
- Parse and map native stop reason fields in each adapter.
- Populate
stop_reasonandraw_stop_reasoninChatResponse. - Add fixture-based unit tests for mapping.
- Introduce
ContinuationControllerinsrc/agent/loop_.rs. - Route
MaxTokensthrough continuation FSM before finalization. - Merge continuation text chunks into one coherent assistant response.
- Keep existing tool parsing and loop-detection guards intact.
Add config keys under agent:
continuation_max_attemptscontinuation_max_output_charscontinuation_max_total_completion_tokenscontinuation_tool_repair_attempts
- stop-reason mapping tests per provider adapter.
- continuation FSM transition tests (all terminal paths).
- budget cap tests and retry-limit behavior.
- mock provider returns
MaxTokensthen successful continuation. - mock provider returns repeated
MaxTokensuntil retry cap. - mock provider emits partial tool-call JSON then repaired payload.
- ensure non-truncated normal responses are unchanged.
- ensure existing parser recovery tests in
loop_/parsing.rsremain green. - verify no duplicate text when continuation merges.
| Risk | Impact | Mitigation |
|---|---|---|
| Provider mapping drift | incorrect continuation triggers | keep raw_stop_reason + tests |
| Continuation repetition loops | poor UX, extra tokens | dedupe heuristics + strict caps |
| Partial tool-call execution | unsafe tool behavior | hard block on malformed payload |
| Latency growth | slower responses | cap attempts and emit metrics |
- Provider stop-reason mapping documented.
- Continuation policy and hard limits documented.
- Partial tool-call handling strategy documented.
- Proposed state machine documented for implementation.