|
| 1 | +--- |
| 2 | +author: Nexu |
| 3 | +--- |
| 4 | + |
| 5 | +# Agentic Loop |
| 6 | + |
| 7 | +> **Core Insight:** Every agent is a loop — think, act, observe, repeat. The loop itself is trivial. What makes it production-grade is how you handle the edges: when to stop, what to do when tools fail, and how to prevent infinite cycles. |
| 8 | +
|
| 9 | +## The Pattern |
| 10 | + |
| 11 | +The Agentic Loop (also called the ReAct pattern — Reason + Act) is the fundamental execution cycle of any AI agent. The model generates a response, optionally invokes one or more tools, observes the results, and loops until the task is done. |
| 12 | + |
| 13 | +``` |
| 14 | +┌─────────────┐ |
| 15 | +│ Reason │◄──────────────────┐ |
| 16 | +│ (LLM call) │ │ |
| 17 | +└──────┬──────┘ │ |
| 18 | + │ │ |
| 19 | + ▼ │ |
| 20 | + ┌─────────┐ No tools ┌────┴─────┐ |
| 21 | + │ Tools? ├───────────────►│ Output │ |
| 22 | + └────┬────┘ └──────────┘ |
| 23 | + │ Yes |
| 24 | + ▼ |
| 25 | + ┌─────────┐ |
| 26 | + │ Execute │ |
| 27 | + │ tools │ |
| 28 | + └────┬────┘ |
| 29 | + │ |
| 30 | + ▼ |
| 31 | + ┌─────────┐ |
| 32 | + │ Observe │ |
| 33 | + │ results ├─────────────────────┘ |
| 34 | + └─────────┘ |
| 35 | +``` |
| 36 | + |
| 37 | +This is distinct from a simple tool-calling API. A single tool call is a one-shot: the model says "call this function," you return the result. An **agentic loop** runs that process repeatedly — the model sees the result, decides it needs more information, calls another tool, sees *that* result, and continues until it has enough context to produce a final answer. |
| 38 | + |
| 39 | +## Implementation |
| 40 | + |
| 41 | +A minimal agentic loop in Python: |
| 42 | + |
| 43 | +```python |
| 44 | +def agentic_loop(messages: list, tools: list, max_turns: int = 25) -> str: |
| 45 | + """Run the agentic loop until the model produces a final text response.""" |
| 46 | + for turn in range(max_turns): |
| 47 | + response = llm.chat(messages=messages, tools=tools) |
| 48 | + assistant_msg = response.choices[0].message |
| 49 | + messages.append(assistant_msg) |
| 50 | + |
| 51 | + # Exit condition: no tool calls means the model is done |
| 52 | + if not assistant_msg.tool_calls: |
| 53 | + return assistant_msg.content |
| 54 | + |
| 55 | + # Execute each tool call and append results |
| 56 | + for tool_call in assistant_msg.tool_calls: |
| 57 | + result = dispatch_tool(tool_call) |
| 58 | + messages.append({ |
| 59 | + "role": "tool", |
| 60 | + "tool_call_id": tool_call.id, |
| 61 | + "content": str(result) |
| 62 | + }) |
| 63 | + |
| 64 | + raise AgentLoopError(f"Agent did not complete within {max_turns} turns") |
| 65 | +``` |
| 66 | + |
| 67 | +The `max_turns` parameter is critical. Without it, a confused model will loop forever — calling the same tool repeatedly, getting the same error, and burning tokens. This is the simplest guardrail and should always be present. |
| 68 | + |
| 69 | +## Parallel Tool Calls |
| 70 | + |
| 71 | +Modern APIs support **parallel tool calls** — the model can request multiple tools in a single response. This is not just an optimization; it changes agent behavior. A model that needs to read three files will request all three simultaneously rather than sequentially: |
| 72 | + |
| 73 | +```python |
| 74 | +# A single assistant message might contain: |
| 75 | +# tool_calls = [read_file("a.py"), read_file("b.py"), read_file("c.py")] |
| 76 | + |
| 77 | +for tool_call in assistant_msg.tool_calls: |
| 78 | + result = dispatch_tool(tool_call) |
| 79 | + messages.append({ |
| 80 | + "role": "tool", |
| 81 | + "tool_call_id": tool_call.id, |
| 82 | + "content": str(result) |
| 83 | + }) |
| 84 | +# All three results are appended, then the model sees them all at once |
| 85 | +``` |
| 86 | + |
| 87 | +## Turn Budget and Exit Conditions |
| 88 | + |
| 89 | +The loop needs clear exit conditions beyond `max_turns`: |
| 90 | + |
| 91 | +| Condition | Action | |
| 92 | +|-----------|--------| |
| 93 | +| No tool calls in response | Return the text — agent is done | |
| 94 | +| Max turns reached | Raise error or force summarization | |
| 95 | +| Token budget exceeded | Trigger context compression, then continue | |
| 96 | +| Consecutive identical tool calls | Likely stuck — escalate or abort | |
| 97 | +| Human interrupt signal | Pause loop, surface current state | |
| 98 | + |
| 99 | +```python |
| 100 | +def detect_loop(messages: list, window: int = 3) -> bool: |
| 101 | + """Detect if the agent is stuck calling the same tool repeatedly.""" |
| 102 | + recent_calls = [] |
| 103 | + for msg in messages[-window * 2:]: |
| 104 | + if hasattr(msg, 'tool_calls') and msg.tool_calls: |
| 105 | + recent_calls.extend( |
| 106 | + (tc.function.name, tc.function.arguments) for tc in msg.tool_calls |
| 107 | + ) |
| 108 | + if len(recent_calls) >= window: |
| 109 | + return len(set(recent_calls[-window:])) == 1 |
| 110 | + return False |
| 111 | +``` |
| 112 | + |
| 113 | +## Streaming in the Loop |
| 114 | + |
| 115 | +Production harnesses stream the model's output token by token while the loop runs. This is important for user experience — the human sees the agent "thinking" in real time, not staring at a blank screen: |
| 116 | + |
| 117 | +```python |
| 118 | +for turn in range(max_turns): |
| 119 | + stream = llm.chat(messages=messages, tools=tools, stream=True) |
| 120 | + |
| 121 | + tool_calls = [] |
| 122 | + text_chunks = [] |
| 123 | + |
| 124 | + for chunk in stream: |
| 125 | + delta = chunk.choices[0].delta |
| 126 | + if delta.content: |
| 127 | + text_chunks.append(delta.content) |
| 128 | + emit_to_user(delta.content) # Real-time streaming |
| 129 | + if delta.tool_calls: |
| 130 | + accumulate_tool_calls(tool_calls, delta.tool_calls) |
| 131 | + |
| 132 | + if not tool_calls: |
| 133 | + return "".join(text_chunks) |
| 134 | + |
| 135 | + # Execute tools and continue loop |
| 136 | + ... |
| 137 | +``` |
| 138 | + |
| 139 | +## Common Pitfalls |
| 140 | + |
| 141 | +- **No turn limit** — The single most common harness bug. Always set a maximum. |
| 142 | +- **Swallowing tool errors** — If a tool fails silently, the model will retry or hallucinate success. Always return error messages as tool results so the model can adapt. |
| 143 | +- **Appending raw results** — Large tool outputs (entire files, API responses) bloat the context window. Truncate or summarize before appending. |
| 144 | +- **Ignoring parallel calls** — If your loop processes tool calls sequentially but the model issued them in parallel, you may create ordering dependencies that don't exist. |
| 145 | + |
| 146 | +## Further Reading |
| 147 | + |
| 148 | +- [Yao et al., "ReAct: Synergizing Reasoning and Acting"](https://arxiv.org/abs/2210.03629) — The original paper formalizing the Reason + Act pattern |
| 149 | +- [Anthropic: Building Effective Agents](https://www.anthropic.com/research/building-effective-agents) — Practical patterns for production loops |
0 commit comments