-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Context
I was experimenting with creating multiple agents that can call other agents as a tool. I noticed that in some cases, when one agent calls another agent (as a tool) that invokes another tool (incl. MCP tool), CAI hangs indefinitely even after the tool output is returned.
Did a bit of debugging and I realized the fix_message_list function in cai/util.py contains a bug that causes the execution hang. This occurs when an assistant message has multiple tool_calls and the tool responses are out of order in the conversation history.
Bug Description
The fix_message_list function has a second pass (a while loop) that ensures every tool message directly follows its matching assistant message. To decide whether a tool message is already in the right place, it checks if the immediately preceding message is the matching assistant. This check is too strict, i.e., it doesn't account for sibling tool messages belonging to the same assistant.
Affected Code
# Only checks the single previous message
prev_msg = processed_messages[i - 1]
is_valid_sequence = (
prev_msg.get("role") == "assistant"
and prev_msg.get("tool_calls")
and any(tc.get("id") == tool_id for tc in prev_msg.get("tool_calls", []))
)When an assistant has two tool calls (call_1, call_2), the valid sequence looks like:
[0] assistant(tool_calls=[call_1, call_2])
[1] tool(call_1)
[2] tool(call_2)
But at index 2, the code sees tool(call_1) as the previous message — not an assistant — and wrongly concludes tool(call_2) is out of place. It then moves tool(call_2) to position 1, which pushes tool(call_1) to position 2, where the same wrong check triggers again. This creates an infinite ping-pong:
iteration 1:
[0] assistant(tool_calls=[call_1, call_2])
[1] tool(call_1)
[2] tool(call_2) ← i=2, prev is tool(call_1) not assistant → move to pos 1
iteration 2:
[0] assistant(tool_calls=[call_1, call_2])
[1] tool(call_2) ← just moved here
[2] tool(call_1) ← i=2, prev is tool(call_2) not assistant → move to pos 1
iteration 3:
[0] assistant(tool_calls=[call_1, call_2])
[1] tool(call_1) ← just moved here
[2] tool(call_2) ← i=2, same as iteration 1...
... forever
Fix
Instead of only checking the single previous message, walk backward past sibling tool messages to find the nearest assistant. If that assistant owns the current tool message, the sequence is valid and no move is needed.
When Does This Happen?
This triggers when tool responses arrive out of order relative to the tool_calls array. In my case it happened on agent-as-tool approach where sub-agents run concurrently.
Example Setup
from cai.sdk.agents import Agent
wiz_operator_agent = Agent(
name="wiz_operator_agent",
instructions="...",
mcp_servers=[wiz], # Wiz MCP (list_issues, get_issue_v2, ...)
)
github_operator_agent = Agent(
name="github_operator_agent",
instructions="...",
mcp_servers=[github], # GitHub MCP (get_file_contents, search_code, ...)
)
# Parent agent calls sub-agents as tools
alert_agent = Agent(
name="alert_agent",
instructions="...",
tools=[
wiz_operator_agent.as_tool(
tool_name="verify_wiz_exposure",
tool_description="Re-verify exposure claims in Wiz",
),
github_operator_agent.as_tool(
tool_name="verify_github_code",
tool_description="Re-verify code analysis: check for fixes/PRs",
),
],
)When alert_agent calls both tools in one turn, each triggers a Runner.run() inside as_tool. The Wiz call might complete before the GitHub call, but the LLM listed GitHub first in tool_calls:
# What the LLM produced
{"role": "assistant", "tool_calls": [
{"id": "call_1", "function": {"name": "verify_wiz_exposure", ...}},
{"id": "call_2", "function": {"name": "verify_github_code", ...}}
]}
# What arrived (GitHub finished first due to simpler MCP call chain)
{"role": "tool", "tool_call_id": "call_2", "content": "GitHub: PR #1234 patches CVE..."}
{"role": "tool", "tool_call_id": "call_1", "content": "Wiz: resource is internet-facing..."}call_2 arrived before call_1, but call_1 was listed first → out of order → infinite loop.
Steps to Reproduce
Pass this message list to fix_message_list:
from cai.util import fix_message_list
messages = [
{"role": "user", "content": "Validate alert"},
{"role": "assistant", "tool_calls": [
{"id": "call_1", "type": "function", "function": {"name": "verify_wiz_exposure", "arguments": "{}"}},
{"id": "call_2", "type": "function", "function": {"name": "verify_github_code", "arguments": "{}"}}
]},
{"role": "tool", "tool_call_id": "call_2", "content": "GitHub result"}, # Out of order
{"role": "tool", "tool_call_id": "call_1", "content": "Wiz result"},
]
fix_message_list(messages) # Hangs forever