-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Recover the ReAct loop from MAX_TOKENS truncation mid-tool-call #24260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: loa/openmetrics-ai-gen
Are you sure you want to change the base?
Changes from 2 commits
109ec12
6927671
6b6676a
f0c9a10
c237938
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12,6 +12,23 @@ | |
| from ddev.ai.react.types import ReActResult | ||
| from ddev.ai.tools.core.types import ToolResult | ||
|
|
||
| # A turn that stops on MAX_TOKENS while a tool call is pending was truncated mid-call: the | ||
| # tool_use block is incomplete and was never executed. We must still answer every tool_use | ||
| # with a tool_result, otherwise the next send() replays a dangling tool_use and the provider | ||
| # rejects the request. This synthetic result repairs the conversation and nudges the model | ||
| # toward a smaller follow-up. | ||
| TRUNCATED_TOOL_CALL_ERROR = ( | ||
| "This tool call was NOT executed: your previous response was truncated after reaching the " | ||
| "maximum output token limit, so the tool call is incomplete. Retry with a smaller, more " | ||
| "targeted change — edit a single small unique region instead of rewriting a whole file, or " | ||
| "split the work across several sequential tool calls. For a full-file rewrite prefer " | ||
| "create_file over one huge edit_file." | ||
| ) | ||
|
|
||
| # Upper bound on back-to-back truncated turns before we give up, to avoid an unrecoverable loop | ||
| # where the model keeps emitting an oversized tool call that never fits in the output budget. | ||
| MAX_CONSECUTIVE_TRUNCATIONS = 2 | ||
|
|
||
|
|
||
| class ReActProcess: | ||
| """ | ||
|
|
@@ -77,6 +94,22 @@ def _is_compact_needed(self, response: AgentResponse) -> bool: | |
| return False | ||
| return True | ||
|
|
||
| async def _execute_tool_calls(self, tool_calls: list) -> list[ToolResult]: | ||
| """Run all tool calls in parallel, converting any raised exception into a failure result.""" | ||
| raw_results = await asyncio.gather( | ||
| *[self._tool_registry.run(tc.name, tc.input) for tc in tool_calls], | ||
| return_exceptions=True, | ||
| ) | ||
| return [ | ||
| r if isinstance(r, ToolResult) else ToolResult(success=False, error=f"{type(r).__name__}: {r}") | ||
| for r in raw_results | ||
| ] | ||
|
|
||
| @staticmethod | ||
| def _truncated_tool_results(tool_calls: list) -> list[ToolResult]: | ||
| """Synthetic failure results for a turn truncated by the output token limit.""" | ||
| return [ToolResult(success=False, error=TRUNCATED_TOOL_CALL_ERROR) for _ in tool_calls] | ||
|
|
||
| async def start(self, prompt: str, allowed_tools: list[str] | None = None) -> ReActResult: | ||
| """ | ||
| Run the ReAct loop for a single task. | ||
|
|
@@ -104,18 +137,24 @@ async def start(self, prompt: str, allowed_tools: list[str] | None = None) -> Re | |
| await self._callbacks.fire_agent_response(self._scope, response, iterations) | ||
|
|
||
| # No iteration cap — this is an interactive CLI tool; the user can Ctrl+C to stop. | ||
| while response.stop_reason == StopReason.TOOL_USE: | ||
| if not response.tool_calls: | ||
| raise AgentError("Agent returned stop_reason=TOOL_USE with no tool calls") | ||
|
|
||
| raw_results = await asyncio.gather( | ||
| *[self._tool_registry.run(tc.name, tc.input) for tc in response.tool_calls], | ||
| return_exceptions=True, | ||
| ) | ||
| tool_results: list[ToolResult] = [ | ||
| r if isinstance(r, ToolResult) else ToolResult(success=False, error=f"{type(r).__name__}: {r}") | ||
| for r in raw_results | ||
| ] | ||
| # Loop while a tool call is pending. A MAX_TOKENS turn can also carry a (truncated) | ||
| # tool_use that must be answered, so we key off tool_calls rather than the stop reason. | ||
| consecutive_truncations = 0 | ||
| while response.tool_calls: | ||
| truncated = response.stop_reason == StopReason.MAX_TOKENS | ||
|
Comment on lines
+151
to
+152
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
When a follow-up response contains Useful? React with 👍 / 👎. |
||
| if truncated: | ||
| consecutive_truncations += 1 | ||
| if consecutive_truncations > MAX_CONSECUTIVE_TRUNCATIONS: | ||
| raise AgentError( | ||
| "Agent response was truncated by the output token limit on " | ||
| f"{consecutive_truncations} consecutive turns while a tool call was " | ||
| "pending; aborting to avoid an unrecoverable loop. Reduce the amount " | ||
| "of work attempted in a single tool call." | ||
| ) | ||
| tool_results: list[ToolResult] = self._truncated_tool_results(response.tool_calls) | ||
| else: | ||
| consecutive_truncations = 0 | ||
| tool_results = await self._execute_tool_calls(response.tool_calls) | ||
| total_input += sum(result.total_input_tokens for result in tool_results) | ||
| total_output += sum(result.total_output_tokens for result in tool_results) | ||
|
|
||
|
|
@@ -140,6 +179,9 @@ async def start(self, prompt: str, allowed_tools: list[str] | None = None) -> Re | |
| total_input += compact_in | ||
| total_output += compact_out | ||
|
|
||
| if response.stop_reason == StopReason.TOOL_USE: | ||
| raise AgentError("Agent returned stop_reason=TOOL_USE with no tool calls") | ||
|
|
||
| react_result = ReActResult( | ||
| final_response=response, | ||
| iterations=iterations, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we don't have any way of easily rewriting an entire file. The only way is using
edit_file, sincecreate_fileonly creates a file if it doesn't exist. Maybe, as a follow up, we could include a flag increate_fileto allow overwriting if it already exists.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem still persist, it is that if it wants to edit (or create) a file that is too big it will run into max_tokens error. We could add an overwrite option to write a file but if we do not mention the token limit that might be an issue.
Maybe, since we know the max tokens set, we could include that into the prompt itself? Inejct it as a guideline for the agent not know not to do weird things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, saw your last commit and looks good. Thank you!