nexu-io
diff --git a/‎README.md‎
Lines changed: 18 additions & 31 deletions b/‎README.md‎
Lines changed: 18 additions & 31 deletions
diff --git a/‎README.zh-CN.md‎
Lines changed: 18 additions & 31 deletions b/‎README.zh-CN.md‎
Lines changed: 18 additions & 31 deletions
diff --git a/‎guide/agentic-loop.md‎
Lines changed: 149 additions & 0 deletions b/‎guide/agentic-loop.md‎
Lines changed: 149 additions & 0 deletions
@@ -19,58 +19,45 @@
 
 ---
 
-A **harness** is the runtime layer that wraps an AI model and turns it into a useful agent. It handles everything the model can't do on its own: reading files, calling tools, remembering context, and deciding when to stop. As AI agents move from demos to production, the harness — not the model — is becoming the differentiator.
+A **harness** is the runtime wrapper that turns a bare language model into an **agent** — an autonomous system that can perceive its environment, make decisions, and take actions over multiple steps. The harness handles everything the model can't do on its own: executing tools, managing memory, assembling context, and enforcing safety boundaries.
 
-This guide covers every aspect of harness engineering, from writing your first tool loop to scaling multi-agent systems in production.
+This guide covers harness engineering from first principles to production patterns, with real code in every article.
 
 ---
 
 ## Getting Started
 
 | Topic | Description |
 |-------|-------------|
-| [What is a Harness?](guide/what-is-harness.md) | The concept in 3 minutes. Minimal code example. Harness vs. framework vs. runtime. |
-| [Your First Harness](guide/your-first-harness.md) | Build a working harness in 15 minutes. Complete Python code you can copy and run. |
-| [Harness vs. Framework](guide/harness-vs-framework.md) | When to use a raw harness vs. LangChain/CrewAI. Decision tree + code comparison. |
+| [What is a Harness?](guide/what-is-harness.md) | The concept in 3 minutes. How it turns a model into an agent. Harness vs. framework vs. runtime. |
+| [Your First Harness](guide/your-first-harness.md) | Build a working harness in 50 lines of Python. Complete code you can copy and run. |
+| [Harness vs. Framework](guide/harness-vs-framework.md) | When to use a raw harness vs. LangChain/CrewAI. Decision tree + side-by-side code comparison. |
 
-## Core Patterns
+## Core Concepts
 
 | Topic | Description |
 |-------|-------------|
-| [The AGENTS.md Pattern](guide/agents-md.md) | Define agent behavior in a plain-text file. Version-controlled, portable, transparent. |
-| [The MEMORY.md Pattern](guide/memory-md.md) | Persistent memory with daily logs + curated long-term memory. |
-| [The Tool Loop](guide/tool-loop.md) | The ReAct loop in engineering terms. Adding tools without changing the loop. |
-| [Skill Loading](guide/skill-loading.md) | Loading tools on demand instead of all at once. Token cost comparison. |
-| [Thin Harness Architecture](guide/thin-harness.md) | Why the harness should be minimal. Thin harness + thick skills. |
-| [Context Window Management](guide/context-window.md) | Priority systems, token budgets, sliding window implementation. |
+| [Agentic Loop](guide/agentic-loop.md) | The think → act → observe cycle. Turn budgets, parallel tool calls, loop detection, streaming. |
+| [Tool System](guide/tool-system.md) | Tool registry, static vs. dynamic loading, MCP protocol, description quality patterns. |
+| [Memory & Context](guide/memory-and-context.md) | Context assembly, session management, two-tier memory (daily logs + long-term). AGENTS.md and MEMORY.md patterns. |
+| [Guardrails](guide/guardrails.md) | Permission models, trust boundaries, sandboxing, prompt injection defense. |
 
-## Techniques
+## Practice
 
 | Topic | Description |
 |-------|-------------|
-| [Context Compression](guide/context-compression.md) | Three lines of defense: auto-decay, threshold, active compression. |
-| [Multi-Agent Patterns](guide/multi-agent.md) | Leader-Worker, file-based inbox, handshake, auto-claim, git worktree isolation. |
-| [Git Worktree Isolation](guide/git-worktree-isolation.md) | Parallel agent tasks without conflicts. Step-by-step commands. |
-| [Sandbox & Security](guide/sandbox-security.md) | Docker, Firecracker, WASM. Permission models and trust boundaries. |
-| [Structured Output](guide/structured-output.md) | Getting agents to return parseable data. JSON mode, schema validation. |
-| [Error Recovery](guide/error-recovery.md) | Retry strategies, graceful degradation, human-in-the-loop escalation. |
-| [Evaluation & Testing](guide/eval-and-testing.md) | Behavioral testing, trace replay, minimal eval framework. |
-
-## Advanced
-
-| Topic | Description |
-|-------|-------------|
-| [Harness as a Service](guide/harness-as-a-service.md) | Running harnesses in the cloud. Multi-tenant architecture. |
-| [Meta-Harness](guide/meta-harness.md) | Agents that optimize their own harness. The AutoAgent pattern. |
-| [Memory Portability](guide/memory-portability.md) | Moving memory between harness implementations. Migration scripts. |
-| [Scaling Dimensions](guide/scaling-dimensions.md) | Time × Space × Interaction framework for analyzing any harness. |
+| [Context Engineering](guide/context-engineering.md) | Priority-based assembly, three lines of defense for compression, token budgeting. |
+| [Sandbox](guide/sandbox.md) | Docker and Firecracker setups, network isolation, filesystem restrictions. |
+| [Skill System](guide/skill-system.md) | Skill packaging, on-demand loading, SKILL.md format, thin harness + thick skills. |
+| [Sub-Agent](guide/sub-agent.md) | Leader-Worker pattern, file-based communication, session isolation, parallel execution. |
+| [Error Handling](guide/error-handling.md) | Error classification, retry strategies, graceful degradation, checkpoint/resume. |
 
 ## Reference
 
 | Topic | Description |
 |-------|-------------|
-| [Implementation Comparison](guide/comparison.md) | Side-by-side comparison of OpenClaw, Claude Code, Codex, Cline, Aider, Cursor, Nexu. |
-| [Glossary](guide/glossary.md) | 23 key terms defined. |
+| [Implementation Comparison](guide/comparison.md) | Side-by-side comparison of OpenClaw, Claude Code, Codex, Cline, Aider, Cursor. |
+| [Glossary](guide/glossary.md) | Key terms defined. |
 
 ---
 
 
@@ -19,58 +19,45 @@
 
 ---
 
-**Harness（驾驭层）** 是包裹 AI 模型的运行时层，负责把一个裸模型变成真正有用的 Agent。它处理模型自己做不了的事：读写文件、调用工具、跨会话记忆、以及决定何时停止。当 AI Agent 从演示走向生产，Harness 层——而非模型本身——正在成为产品的核心差异化。
+**Harness** 是包裹语言模型的运行时层，将裸模型变成一个 **Agent** — 能感知环境、做出决策、多步执行动作的自主系统。Harness 负责模型自身做不了的一切：执行 Tool、管理 Memory、组装 Context、以及强制执行安全边界。
 
-本指南覆盖 Harness Engineering 的方方面面，从写第一个工具调用循环到在生产环境运行多 Agent 系统。
+本指南从第一性原理到生产模式，每篇文章都配有可运行的代码。
 
 ---
 
 ## 入门
 
 | 主题 | 描述 |
 |------|------|
-| [什么是 Harness？](guide/what-is-harness.md) | 3 分钟理解核心概念。最简代码示例。Harness vs 框架 vs 运行时。 |
-| [搭建你的第一个 Harness](guide/your-first-harness.md) | 15 分钟搭建一个可运行的 Harness。完整 Python 代码。 |
-| [Harness 和框架的区别](guide/harness-vs-framework.md) | 什么时候用 Harness，什么时候用 LangChain/CrewAI。决策树 + 代码对比。 |
+| [什么是 Harness？](guide/what-is-harness.md) | 3 分钟理解核心概念。模型如何变成 Agent。Harness vs Framework vs Runtime。 |
+| [搭建你的第一个 Harness](guide/your-first-harness.md) | 50 行 Python 搭建一个可运行的 Harness。完整代码可直接复制。 |
+| [Harness 与 Framework 的区别](guide/harness-vs-framework.md) | 什么时候用 Harness，什么时候用 LangChain/CrewAI。决策树 + 代码对比。 |
 
-## 核心模式
+## 核心概念
 
 | 主题 | 描述 |
 |------|------|
-| [AGENTS.md 模式](guide/agents-md.md) | 用纯文本文件定义 Agent 行为。可版本控制、可移植、完全透明。 |
-| [MEMORY.md 模式](guide/memory-md.md) | 持久化记忆：每日日志 + 长期精选记忆。 |
-| [工具调用循环](guide/tool-loop.md) | ReAct 模式的工程实现。添加工具不改循环。 |
-| [Skill 按需加载](guide/skill-loading.md) | 按需加载工具而非一次性全部塞入。Token 消耗对比。 |
-| [薄 Harness 架构](guide/thin-harness.md) | 为什么 Harness 应该尽量薄。薄 Harness + 厚 Skills。 |
-| [上下文窗口管理](guide/context-window.md) | 优先级系统、Token 预算、滑窗实现。 |
+| [Agentic Loop](guide/agentic-loop.md) | think → act → observe 循环。Turn 预算、并行 Tool 调用、循环检测、Streaming。 |
+| [Tool System](guide/tool-system.md) | Tool 注册、静态 vs 动态加载、MCP 协议、描述质量模式。 |
+| [Memory & Context](guide/memory-and-context.md) | Context 组装、Session 管理、两级 Memory（日志 + 长期记忆）。AGENTS.md 和 MEMORY.md 模式。 |
+| [Guardrails](guide/guardrails.md) | 权限模型、信任边界、Sandbox、Prompt Injection 防御。 |
 
-## 实战技术
+## 实战
 
 | 主题 | 描述 |
 |------|------|
-| [上下文压缩](guide/context-compression.md) | 三道防线：自动衰减、阈值压缩、主动压缩。 |
-| [多 Agent 协作](guide/multi-agent.md) | Leader-Worker、文件夹收件箱、握手协议、自动认领、Git Worktree 隔离。 |
-| [Git Worktree 隔离](guide/git-worktree-isolation.md) | 并行 Agent 任务互不冲突。逐步命令。 |
-| [沙箱与安全](guide/sandbox-security.md) | Docker、Firecracker、WASM。权限模型和信任边界。 |
-| [结构化输出](guide/structured-output.md) | 让 Agent 返回可解析的数据。JSON 模式、Schema 校验。 |
-| [错误恢复](guide/error-recovery.md) | 重试策略、优雅降级、人工介入升级。 |
-| [评估与测试](guide/eval-and-testing.md) | 行为测试、Trace 回放、最简评估框架。 |
-
-## 进阶
-
-| 主题 | 描述 |
-|------|------|
-| [Harness 即服务](guide/harness-as-a-service.md) | 云端运行 Harness。多租户架构。 |
-| [Meta-Harness](guide/meta-harness.md) | Agent 优化自己的 Harness。AutoAgent 模式。 |
-| [记忆可移植性](guide/memory-portability.md) | 在不同 Harness 实现间迁移记忆。迁移脚本。 |
-| [三维扩展](guide/scaling-dimensions.md) | 时间 × 空间 × 交互 框架，分析任何 Harness。 |
+| [Context Engineering](guide/context-engineering.md) | 优先级组装、压缩三道防线、Token 预算。 |
+| [Sandbox](guide/sandbox.md) | Docker 和 Firecracker 配置、网络隔离、文件系统限制。 |
+| [Skill System](guide/skill-system.md) | Skill 打包、按需加载、SKILL.md 格式、薄 Harness + 厚 Skill。 |
+| [Sub-Agent](guide/sub-agent.md) | Leader-Worker 模式、文件通信、Session 隔离、并行执行。 |
+| [Error Handling](guide/error-handling.md) | 错误分类、重试策略、优雅降级、Checkpoint/Resume。 |
 
 ## 参考
 
 | 主题 | 描述 |
 |------|------|
-| [实现对比](guide/comparison.md) | OpenClaw、Claude Code、Codex、Cline、Aider、Cursor、Nexu 横向对比。 |
-| [术语表](guide/glossary.md) | 23 个关键术语定义。 |
+| [实现对比](guide/comparison.md) | OpenClaw、Claude Code、Codex、Cline、Aider、Cursor 横向对比。 |
+| [术语表](guide/glossary.md) | 关键术语定义。 |
 
 ---
 
 
@@ -0,0 +1,149 @@
+---
+author: Nexu
+---
+
+# Agentic Loop
+
+> **Core Insight:** Every agent is a loop — think, act, observe, repeat. The loop itself is trivial. What makes it production-grade is how you handle the edges: when to stop, what to do when tools fail, and how to prevent infinite cycles.
+
+## The Pattern
+
+The Agentic Loop (also called the ReAct pattern — Reason + Act) is the fundamental execution cycle of any AI agent. The model generates a response, optionally invokes one or more tools, observes the results, and loops until the task is done.
+
+```
+┌─────────────┐
+│   Reason    │◄──────────────────┐
+│  (LLM call) │                   │
+└──────┬──────┘                   │
+       │                          │
+       ▼                          │
+  ┌─────────┐    No tools    ┌────┴─────┐
+  │  Tools? ├───────────────►│  Output  │
+  └────┬────┘                └──────────┘
+       │ Yes
+       ▼
+  ┌─────────┐
+  │ Execute │
+  │  tools  │
+  └────┬────┘
+       │
+       ▼
+  ┌─────────┐
+  │ Observe │
+  │ results ├─────────────────────┘
+  └─────────┘
+```
+
+This is distinct from a simple tool-calling API. A single tool call is a one-shot: the model says "call this function," you return the result. An **agentic loop** runs that process repeatedly — the model sees the result, decides it needs more information, calls another tool, sees *that* result, and continues until it has enough context to produce a final answer.
+
+## Implementation
+
+A minimal agentic loop in Python:
+
+```python
+def agentic_loop(messages: list, tools: list, max_turns: int = 25) -> str:
+    """Run the agentic loop until the model produces a final text response."""
+    for turn in range(max_turns):
+        response = llm.chat(messages=messages, tools=tools)
+        assistant_msg = response.choices[0].message
+        messages.append(assistant_msg)
+
+        # Exit condition: no tool calls means the model is done
+        if not assistant_msg.tool_calls:
+            return assistant_msg.content
+
+        # Execute each tool call and append results
+        for tool_call in assistant_msg.tool_calls:
+            result = dispatch_tool(tool_call)
+            messages.append({
+                "role": "tool",
+                "tool_call_id": tool_call.id,
+                "content": str(result)
+            })
+
+    raise AgentLoopError(f"Agent did not complete within {max_turns} turns")
+```
+
+The `max_turns` parameter is critical. Without it, a confused model will loop forever — calling the same tool repeatedly, getting the same error, and burning tokens. This is the simplest guardrail and should always be present.
+
+## Parallel Tool Calls
+
+Modern APIs support **parallel tool calls** — the model can request multiple tools in a single response. This is not just an optimization; it changes agent behavior. A model that needs to read three files will request all three simultaneously rather than sequentially:
+
+```python
+# A single assistant message might contain:
+# tool_calls = [read_file("a.py"), read_file("b.py"), read_file("c.py")]
+
+for tool_call in assistant_msg.tool_calls:
+    result = dispatch_tool(tool_call)
+    messages.append({
+        "role": "tool",
+        "tool_call_id": tool_call.id,
+        "content": str(result)
+    })
+# All three results are appended, then the model sees them all at once
+```
+
+## Turn Budget and Exit Conditions
+
+The loop needs clear exit conditions beyond `max_turns`:
+
+| Condition | Action |
+|-----------|--------|
+| No tool calls in response | Return the text — agent is done |
+| Max turns reached | Raise error or force summarization |
+| Token budget exceeded | Trigger context compression, then continue |
+| Consecutive identical tool calls | Likely stuck — escalate or abort |
+| Human interrupt signal | Pause loop, surface current state |
+
+```python
+def detect_loop(messages: list, window: int = 3) -> bool:
+    """Detect if the agent is stuck calling the same tool repeatedly."""
+    recent_calls = []
+    for msg in messages[-window * 2:]:
+        if hasattr(msg, 'tool_calls') and msg.tool_calls:
+            recent_calls.extend(
+                (tc.function.name, tc.function.arguments) for tc in msg.tool_calls
+            )
+    if len(recent_calls) >= window:
+        return len(set(recent_calls[-window:])) == 1
+    return False
+```
+
+## Streaming in the Loop
+
+Production harnesses stream the model's output token by token while the loop runs. This is important for user experience — the human sees the agent "thinking" in real time, not staring at a blank screen:
+
+```python
+for turn in range(max_turns):
+    stream = llm.chat(messages=messages, tools=tools, stream=True)
+
+    tool_calls = []
+    text_chunks = []
+
+    for chunk in stream:
+        delta = chunk.choices[0].delta
+        if delta.content:
+            text_chunks.append(delta.content)
+            emit_to_user(delta.content)  # Real-time streaming
+        if delta.tool_calls:
+            accumulate_tool_calls(tool_calls, delta.tool_calls)
+
+    if not tool_calls:
+        return "".join(text_chunks)
+
+    # Execute tools and continue loop
+    ...
+```
+
+## Common Pitfalls
+
+- **No turn limit** — The single most common harness bug. Always set a maximum.
+- **Swallowing tool errors** — If a tool fails silently, the model will retry or hallucinate success. Always return error messages as tool results so the model can adapt.
+- **Appending raw results** — Large tool outputs (entire files, API responses) bloat the context window. Truncate or summarize before appending.
+- **Ignoring parallel calls** — If your loop processes tool calls sequentially but the model issued them in parallel, you may create ordering dependencies that don't exist.
+
+## Further Reading
+
+- [Yao et al., "ReAct: Synergizing Reasoning and Acting"](https://arxiv.org/abs/2210.03629) — The original paper formalizing the Reason + Act pattern
+- [Anthropic: Building Effective Agents](https://www.anthropic.com/research/building-effective-agents) — Practical patterns for production loops