Skip to content

Commit b7db487

Browse files
joeylee12629-starjoey
andauthored
feat: restructure guide v3 - concept clarity + practice focus (#40)
Major changes per team feedback: - Core Concepts (4): Agentic Loop, Tool System, Memory & Context, Guardrails - Practice (5): Context Engineering, Sandbox, Skill System, Sub-Agent, Error Handling - Removed: worktree, structured-output, eval, advanced section (harness-as-a-service, meta-harness, memory-portability, scaling-dimensions) - Every article: core insight upfront, Agent concept defined, author: Nexu - All 14 articles translated to Chinese - README EN+ZH updated to match Co-authored-by: joey <joey@joeydeMacBook-Air.local>
1 parent 268f9e0 commit b7db487

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

66 files changed

+4707
-9512
lines changed

README.md

Lines changed: 18 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -19,58 +19,45 @@
1919

2020
---
2121

22-
A **harness** is the runtime layer that wraps an AI model and turns it into a useful agent. It handles everything the model can't do on its own: reading files, calling tools, remembering context, and deciding when to stop. As AI agents move from demos to production, the harness — not the model — is becoming the differentiator.
22+
A **harness** is the runtime wrapper that turns a bare language model into an **agent** — an autonomous system that can perceive its environment, make decisions, and take actions over multiple steps. The harness handles everything the model can't do on its own: executing tools, managing memory, assembling context, and enforcing safety boundaries.
2323

24-
This guide covers every aspect of harness engineering, from writing your first tool loop to scaling multi-agent systems in production.
24+
This guide covers harness engineering from first principles to production patterns, with real code in every article.
2525

2626
---
2727

2828
## Getting Started
2929

3030
| Topic | Description |
3131
|-------|-------------|
32-
| [What is a Harness?](guide/what-is-harness.md) | The concept in 3 minutes. Minimal code example. Harness vs. framework vs. runtime. |
33-
| [Your First Harness](guide/your-first-harness.md) | Build a working harness in 15 minutes. Complete Python code you can copy and run. |
34-
| [Harness vs. Framework](guide/harness-vs-framework.md) | When to use a raw harness vs. LangChain/CrewAI. Decision tree + code comparison. |
32+
| [What is a Harness?](guide/what-is-harness.md) | The concept in 3 minutes. How it turns a model into an agent. Harness vs. framework vs. runtime. |
33+
| [Your First Harness](guide/your-first-harness.md) | Build a working harness in 50 lines of Python. Complete code you can copy and run. |
34+
| [Harness vs. Framework](guide/harness-vs-framework.md) | When to use a raw harness vs. LangChain/CrewAI. Decision tree + side-by-side code comparison. |
3535

36-
## Core Patterns
36+
## Core Concepts
3737

3838
| Topic | Description |
3939
|-------|-------------|
40-
| [The AGENTS.md Pattern](guide/agents-md.md) | Define agent behavior in a plain-text file. Version-controlled, portable, transparent. |
41-
| [The MEMORY.md Pattern](guide/memory-md.md) | Persistent memory with daily logs + curated long-term memory. |
42-
| [The Tool Loop](guide/tool-loop.md) | The ReAct loop in engineering terms. Adding tools without changing the loop. |
43-
| [Skill Loading](guide/skill-loading.md) | Loading tools on demand instead of all at once. Token cost comparison. |
44-
| [Thin Harness Architecture](guide/thin-harness.md) | Why the harness should be minimal. Thin harness + thick skills. |
45-
| [Context Window Management](guide/context-window.md) | Priority systems, token budgets, sliding window implementation. |
40+
| [Agentic Loop](guide/agentic-loop.md) | The think → act → observe cycle. Turn budgets, parallel tool calls, loop detection, streaming. |
41+
| [Tool System](guide/tool-system.md) | Tool registry, static vs. dynamic loading, MCP protocol, description quality patterns. |
42+
| [Memory & Context](guide/memory-and-context.md) | Context assembly, session management, two-tier memory (daily logs + long-term). AGENTS.md and MEMORY.md patterns. |
43+
| [Guardrails](guide/guardrails.md) | Permission models, trust boundaries, sandboxing, prompt injection defense. |
4644

47-
## Techniques
45+
## Practice
4846

4947
| Topic | Description |
5048
|-------|-------------|
51-
| [Context Compression](guide/context-compression.md) | Three lines of defense: auto-decay, threshold, active compression. |
52-
| [Multi-Agent Patterns](guide/multi-agent.md) | Leader-Worker, file-based inbox, handshake, auto-claim, git worktree isolation. |
53-
| [Git Worktree Isolation](guide/git-worktree-isolation.md) | Parallel agent tasks without conflicts. Step-by-step commands. |
54-
| [Sandbox & Security](guide/sandbox-security.md) | Docker, Firecracker, WASM. Permission models and trust boundaries. |
55-
| [Structured Output](guide/structured-output.md) | Getting agents to return parseable data. JSON mode, schema validation. |
56-
| [Error Recovery](guide/error-recovery.md) | Retry strategies, graceful degradation, human-in-the-loop escalation. |
57-
| [Evaluation & Testing](guide/eval-and-testing.md) | Behavioral testing, trace replay, minimal eval framework. |
58-
59-
## Advanced
60-
61-
| Topic | Description |
62-
|-------|-------------|
63-
| [Harness as a Service](guide/harness-as-a-service.md) | Running harnesses in the cloud. Multi-tenant architecture. |
64-
| [Meta-Harness](guide/meta-harness.md) | Agents that optimize their own harness. The AutoAgent pattern. |
65-
| [Memory Portability](guide/memory-portability.md) | Moving memory between harness implementations. Migration scripts. |
66-
| [Scaling Dimensions](guide/scaling-dimensions.md) | Time × Space × Interaction framework for analyzing any harness. |
49+
| [Context Engineering](guide/context-engineering.md) | Priority-based assembly, three lines of defense for compression, token budgeting. |
50+
| [Sandbox](guide/sandbox.md) | Docker and Firecracker setups, network isolation, filesystem restrictions. |
51+
| [Skill System](guide/skill-system.md) | Skill packaging, on-demand loading, SKILL.md format, thin harness + thick skills. |
52+
| [Sub-Agent](guide/sub-agent.md) | Leader-Worker pattern, file-based communication, session isolation, parallel execution. |
53+
| [Error Handling](guide/error-handling.md) | Error classification, retry strategies, graceful degradation, checkpoint/resume. |
6754

6855
## Reference
6956

7057
| Topic | Description |
7158
|-------|-------------|
72-
| [Implementation Comparison](guide/comparison.md) | Side-by-side comparison of OpenClaw, Claude Code, Codex, Cline, Aider, Cursor, Nexu. |
73-
| [Glossary](guide/glossary.md) | 23 key terms defined. |
59+
| [Implementation Comparison](guide/comparison.md) | Side-by-side comparison of OpenClaw, Claude Code, Codex, Cline, Aider, Cursor. |
60+
| [Glossary](guide/glossary.md) | Key terms defined. |
7461

7562
---
7663

README.zh-CN.md

Lines changed: 18 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -19,58 +19,45 @@
1919

2020
---
2121

22-
**Harness(驾驭层)** 是包裹 AI 模型的运行时层,负责把一个裸模型变成真正有用的 Agent。它处理模型自己做不了的事:读写文件、调用工具、跨会话记忆、以及决定何时停止。当 AI Agent 从演示走向生产,Harness 层——而非模型本身——正在成为产品的核心差异化
22+
**Harness** 是包裹语言模型的运行时层,将裸模型变成一个 **Agent** — 能感知环境、做出决策、多步执行动作的自主系统。Harness 负责模型自身做不了的一切:执行 Tool、管理 Memory、组装 Context、以及强制执行安全边界
2323

24-
本指南覆盖 Harness Engineering 的方方面面,从写第一个工具调用循环到在生产环境运行多 Agent 系统
24+
本指南从第一性原理到生产模式,每篇文章都配有可运行的代码
2525

2626
---
2727

2828
## 入门
2929

3030
| 主题 | 描述 |
3131
|------|------|
32-
| [什么是 Harness?](guide/what-is-harness.md) | 3 分钟理解核心概念。最简代码示例。Harness vs 框架 vs 运行时|
33-
| [搭建你的第一个 Harness](guide/your-first-harness.md) | 15 分钟搭建一个可运行的 Harness。完整 Python 代码|
34-
| [Harness 和框架的区别](guide/harness-vs-framework.md) | 什么时候用 Harness,什么时候用 LangChain/CrewAI。决策树 + 代码对比。 |
32+
| [什么是 Harness?](guide/what-is-harness.md) | 3 分钟理解核心概念。模型如何变成 Agent。Harness vs Framework vs Runtime|
33+
| [搭建你的第一个 Harness](guide/your-first-harness.md) | 50 行 Python 搭建一个可运行的 Harness。完整代码可直接复制|
34+
| [Harness 与 Framework 的区别](guide/harness-vs-framework.md) | 什么时候用 Harness,什么时候用 LangChain/CrewAI。决策树 + 代码对比。 |
3535

36-
## 核心模式
36+
## 核心概念
3737

3838
| 主题 | 描述 |
3939
|------|------|
40-
| [AGENTS.md 模式](guide/agents-md.md) | 用纯文本文件定义 Agent 行为。可版本控制、可移植、完全透明。 |
41-
| [MEMORY.md 模式](guide/memory-md.md) | 持久化记忆:每日日志 + 长期精选记忆。 |
42-
| [工具调用循环](guide/tool-loop.md) | ReAct 模式的工程实现。添加工具不改循环。 |
43-
| [Skill 按需加载](guide/skill-loading.md) | 按需加载工具而非一次性全部塞入。Token 消耗对比。 |
44-
| [薄 Harness 架构](guide/thin-harness.md) | 为什么 Harness 应该尽量薄。薄 Harness + 厚 Skills。 |
45-
| [上下文窗口管理](guide/context-window.md) | 优先级系统、Token 预算、滑窗实现。 |
40+
| [Agentic Loop](guide/agentic-loop.md) | think → act → observe 循环。Turn 预算、并行 Tool 调用、循环检测、Streaming。 |
41+
| [Tool System](guide/tool-system.md) | Tool 注册、静态 vs 动态加载、MCP 协议、描述质量模式。 |
42+
| [Memory & Context](guide/memory-and-context.md) | Context 组装、Session 管理、两级 Memory(日志 + 长期记忆)。AGENTS.md 和 MEMORY.md 模式。 |
43+
| [Guardrails](guide/guardrails.md) | 权限模型、信任边界、Sandbox、Prompt Injection 防御。 |
4644

47-
## 实战技术
45+
## 实战
4846

4947
| 主题 | 描述 |
5048
|------|------|
51-
| [上下文压缩](guide/context-compression.md) | 三道防线:自动衰减、阈值压缩、主动压缩。 |
52-
| [多 Agent 协作](guide/multi-agent.md) | Leader-Worker、文件夹收件箱、握手协议、自动认领、Git Worktree 隔离。 |
53-
| [Git Worktree 隔离](guide/git-worktree-isolation.md) | 并行 Agent 任务互不冲突。逐步命令。 |
54-
| [沙箱与安全](guide/sandbox-security.md) | Docker、Firecracker、WASM。权限模型和信任边界。 |
55-
| [结构化输出](guide/structured-output.md) | 让 Agent 返回可解析的数据。JSON 模式、Schema 校验。 |
56-
| [错误恢复](guide/error-recovery.md) | 重试策略、优雅降级、人工介入升级。 |
57-
| [评估与测试](guide/eval-and-testing.md) | 行为测试、Trace 回放、最简评估框架。 |
58-
59-
## 进阶
60-
61-
| 主题 | 描述 |
62-
|------|------|
63-
| [Harness 即服务](guide/harness-as-a-service.md) | 云端运行 Harness。多租户架构。 |
64-
| [Meta-Harness](guide/meta-harness.md) | Agent 优化自己的 Harness。AutoAgent 模式。 |
65-
| [记忆可移植性](guide/memory-portability.md) | 在不同 Harness 实现间迁移记忆。迁移脚本。 |
66-
| [三维扩展](guide/scaling-dimensions.md) | 时间 × 空间 × 交互 框架,分析任何 Harness。 |
49+
| [Context Engineering](guide/context-engineering.md) | 优先级组装、压缩三道防线、Token 预算。 |
50+
| [Sandbox](guide/sandbox.md) | Docker 和 Firecracker 配置、网络隔离、文件系统限制。 |
51+
| [Skill System](guide/skill-system.md) | Skill 打包、按需加载、SKILL.md 格式、薄 Harness + 厚 Skill。 |
52+
| [Sub-Agent](guide/sub-agent.md) | Leader-Worker 模式、文件通信、Session 隔离、并行执行。 |
53+
| [Error Handling](guide/error-handling.md) | 错误分类、重试策略、优雅降级、Checkpoint/Resume。 |
6754

6855
## 参考
6956

7057
| 主题 | 描述 |
7158
|------|------|
72-
| [实现对比](guide/comparison.md) | OpenClaw、Claude Code、Codex、Cline、Aider、Cursor、Nexu 横向对比。 |
73-
| [术语表](guide/glossary.md) | 23 个关键术语定义|
59+
| [实现对比](guide/comparison.md) | OpenClaw、Claude Code、Codex、Cline、Aider、Cursor 横向对比。 |
60+
| [术语表](guide/glossary.md) | 关键术语定义|
7461

7562
---
7663

guide/agentic-loop.md

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
---
2+
author: Nexu
3+
---
4+
5+
# Agentic Loop
6+
7+
> **Core Insight:** Every agent is a loop — think, act, observe, repeat. The loop itself is trivial. What makes it production-grade is how you handle the edges: when to stop, what to do when tools fail, and how to prevent infinite cycles.
8+
9+
## The Pattern
10+
11+
The Agentic Loop (also called the ReAct pattern — Reason + Act) is the fundamental execution cycle of any AI agent. The model generates a response, optionally invokes one or more tools, observes the results, and loops until the task is done.
12+
13+
```
14+
┌─────────────┐
15+
│ Reason │◄──────────────────┐
16+
│ (LLM call) │ │
17+
└──────┬──────┘ │
18+
│ │
19+
▼ │
20+
┌─────────┐ No tools ┌────┴─────┐
21+
│ Tools? ├───────────────►│ Output │
22+
└────┬────┘ └──────────┘
23+
│ Yes
24+
25+
┌─────────┐
26+
│ Execute │
27+
│ tools │
28+
└────┬────┘
29+
30+
31+
┌─────────┐
32+
│ Observe │
33+
│ results ├─────────────────────┘
34+
└─────────┘
35+
```
36+
37+
This is distinct from a simple tool-calling API. A single tool call is a one-shot: the model says "call this function," you return the result. An **agentic loop** runs that process repeatedly — the model sees the result, decides it needs more information, calls another tool, sees *that* result, and continues until it has enough context to produce a final answer.
38+
39+
## Implementation
40+
41+
A minimal agentic loop in Python:
42+
43+
```python
44+
def agentic_loop(messages: list, tools: list, max_turns: int = 25) -> str:
45+
"""Run the agentic loop until the model produces a final text response."""
46+
for turn in range(max_turns):
47+
response = llm.chat(messages=messages, tools=tools)
48+
assistant_msg = response.choices[0].message
49+
messages.append(assistant_msg)
50+
51+
# Exit condition: no tool calls means the model is done
52+
if not assistant_msg.tool_calls:
53+
return assistant_msg.content
54+
55+
# Execute each tool call and append results
56+
for tool_call in assistant_msg.tool_calls:
57+
result = dispatch_tool(tool_call)
58+
messages.append({
59+
"role": "tool",
60+
"tool_call_id": tool_call.id,
61+
"content": str(result)
62+
})
63+
64+
raise AgentLoopError(f"Agent did not complete within {max_turns} turns")
65+
```
66+
67+
The `max_turns` parameter is critical. Without it, a confused model will loop forever — calling the same tool repeatedly, getting the same error, and burning tokens. This is the simplest guardrail and should always be present.
68+
69+
## Parallel Tool Calls
70+
71+
Modern APIs support **parallel tool calls** — the model can request multiple tools in a single response. This is not just an optimization; it changes agent behavior. A model that needs to read three files will request all three simultaneously rather than sequentially:
72+
73+
```python
74+
# A single assistant message might contain:
75+
# tool_calls = [read_file("a.py"), read_file("b.py"), read_file("c.py")]
76+
77+
for tool_call in assistant_msg.tool_calls:
78+
result = dispatch_tool(tool_call)
79+
messages.append({
80+
"role": "tool",
81+
"tool_call_id": tool_call.id,
82+
"content": str(result)
83+
})
84+
# All three results are appended, then the model sees them all at once
85+
```
86+
87+
## Turn Budget and Exit Conditions
88+
89+
The loop needs clear exit conditions beyond `max_turns`:
90+
91+
| Condition | Action |
92+
|-----------|--------|
93+
| No tool calls in response | Return the text — agent is done |
94+
| Max turns reached | Raise error or force summarization |
95+
| Token budget exceeded | Trigger context compression, then continue |
96+
| Consecutive identical tool calls | Likely stuck — escalate or abort |
97+
| Human interrupt signal | Pause loop, surface current state |
98+
99+
```python
100+
def detect_loop(messages: list, window: int = 3) -> bool:
101+
"""Detect if the agent is stuck calling the same tool repeatedly."""
102+
recent_calls = []
103+
for msg in messages[-window * 2:]:
104+
if hasattr(msg, 'tool_calls') and msg.tool_calls:
105+
recent_calls.extend(
106+
(tc.function.name, tc.function.arguments) for tc in msg.tool_calls
107+
)
108+
if len(recent_calls) >= window:
109+
return len(set(recent_calls[-window:])) == 1
110+
return False
111+
```
112+
113+
## Streaming in the Loop
114+
115+
Production harnesses stream the model's output token by token while the loop runs. This is important for user experience — the human sees the agent "thinking" in real time, not staring at a blank screen:
116+
117+
```python
118+
for turn in range(max_turns):
119+
stream = llm.chat(messages=messages, tools=tools, stream=True)
120+
121+
tool_calls = []
122+
text_chunks = []
123+
124+
for chunk in stream:
125+
delta = chunk.choices[0].delta
126+
if delta.content:
127+
text_chunks.append(delta.content)
128+
emit_to_user(delta.content) # Real-time streaming
129+
if delta.tool_calls:
130+
accumulate_tool_calls(tool_calls, delta.tool_calls)
131+
132+
if not tool_calls:
133+
return "".join(text_chunks)
134+
135+
# Execute tools and continue loop
136+
...
137+
```
138+
139+
## Common Pitfalls
140+
141+
- **No turn limit** — The single most common harness bug. Always set a maximum.
142+
- **Swallowing tool errors** — If a tool fails silently, the model will retry or hallucinate success. Always return error messages as tool results so the model can adapt.
143+
- **Appending raw results** — Large tool outputs (entire files, API responses) bloat the context window. Truncate or summarize before appending.
144+
- **Ignoring parallel calls** — If your loop processes tool calls sequentially but the model issued them in parallel, you may create ordering dependencies that don't exist.
145+
146+
## Further Reading
147+
148+
- [Yao et al., "ReAct: Synergizing Reasoning and Acting"](https://arxiv.org/abs/2210.03629) — The original paper formalizing the Reason + Act pattern
149+
- [Anthropic: Building Effective Agents](https://www.anthropic.com/research/building-effective-agents) — Practical patterns for production loops

0 commit comments

Comments
 (0)