langchain-ai · eyurtsev · Dec 16, 2025
@@ -1733,3 +1733,255 @@ console.log(result.answer);
 :::
 
 </Accordion>
+
+## Performance comparison
+
+Understanding how different patterns affect performance helps you optimize for latency and cost. We'll compare patterns across three scenarios, measuring:
+- **Model calls**: Number of LLM invocations (each time the model is called to generate a response or tool call)
+- **Tokens processed**: Total context window usage across all calls
+
+### Scenario 1: Single task
+
+> **User:** "Buy milk"
+
+There's a specialized milk agent/skill that can call a `buy_milk` tool.
+
+| Pattern | Model calls | Winner |
+|---------|:-----------:|:------:|
+| [**Subagents**](#subagents) | **4** | |
+| [**Handoffs**](#handoffs) | **3** | ⭐ |
+| [**Skills**](#skills) | **3** | ⭐ |
+| [**Router**](#router) | **3** | ⭐ |
+
+<Tabs>
+  <Tab title="Subagents">
+    **4 model calls:**
+    ```mermaid
+    sequenceDiagram
+        participant User
+        participant Main Agent
+        participant Milk Subagent
+        participant buy_milk tool
+
+        User->>Main Agent: "Buy milk"
+        Note over Main Agent: Call 1
+        Main Agent->>Milk Subagent: milk_subagent()
+        Note over Milk Subagent: Call 2
+        Milk Subagent->>buy_milk tool: buy_milk()
+        buy_milk tool-->>Milk Subagent: Done
+        Note over Milk Subagent: Call 3
+        Milk Subagent-->>Main Agent: "Bought milk"
+        Note over Main Agent: Call 4
+        Main Agent-->>User: "I bought milk for you"
+    ```
+  </Tab>
+
+  <Tab title="Handoffs">
+    **3 model calls:**
+    ```mermaid
+    sequenceDiagram
+        participant User
+        participant Main Agent
+        participant Milk Agent
+        participant buy_milk tool
+
+        User->>Main Agent: "Buy milk"
+        Note over Main Agent: Call 1
+        Main Agent->>Milk Agent: transfer_to_milk_agent()
+        Note over Milk Agent: Call 2
+        Milk Agent->>buy_milk tool: buy_milk()
+        buy_milk tool-->>Milk Agent: Done
+        Note over Milk Agent: Call 3
+        Milk Agent-->>User: "I bought milk for you"
+    ```
+  </Tab>
+
+  <Tab title="Skills">
+    **3 model calls:**
+    ```mermaid
+    sequenceDiagram
+        participant User
+        participant Agent
+        participant load_skill tool
+        participant buy_milk tool
+
+        User->>Agent: "Buy milk"
+        Note over Agent: Call 1
+        Agent->>load_skill tool: load_skill("milk")
+        load_skill tool-->>Agent: Milk skill context
+        Note over Agent: Call 2
+        Agent->>buy_milk tool: buy_milk()
+        buy_milk tool-->>Agent: Done
+        Note over Agent: Call 3
+        Agent-->>User: "I bought milk for you"
+    ```
+  </Tab>
+
+  <Tab title="Router">
+    **3 model calls:**
+    ```mermaid
+    sequenceDiagram
+        participant User
+        participant Router LLM
+        participant Milk Agent
+        participant buy_milk tool
+
+        User->>Router LLM: "Buy milk"
+        Note over Router LLM: Call 1: Route to milk agent
+        Router LLM->>Milk Agent: Invoke with query
+        Note over Milk Agent: Call 2
+        Milk Agent->>buy_milk tool: buy_milk()
+        buy_milk tool-->>Milk Agent: Done
+        Note over Milk Agent: Call 3
+        Milk Agent-->>User: "I bought milk for you"
+    ```
+  </Tab>
+</Tabs>
+
+**Key insight:** Handoffs, Skills, and Router are most efficient for single tasks (3 calls each). Subagents adds one extra call because results flow back through the main agent—this overhead provides centralized control.
+
+### Scenario 2: Follow-up request
+
+> **Turn 1:** "Buy milk"
+> **Turn 2:** "Buy milk again"
+
+The user makes a follow-up request in the same conversation.
+
+| Pattern | Turn 2 calls | Total (both turns) | Winner |
+|---------|:------------:|:------------------:|:------:|
+| [**Subagents**](#subagents) | **4** | **8** | |
+| [**Handoffs**](#handoffs) | **2** | **5** | ⭐ |
+| [**Skills**](#skills) | **2** | **5** | ⭐ |
+| [**Router**](#router) | **3** | **6** | |
+
+<Accordion title="Why the difference?">
+
+**Subagents (4 calls again → 8 total):**
+- Subagents are **stateless by design**—each invocation follows the same flow
+- The main agent maintains conversation context, but subagents start fresh each time
+- This provides strong context isolation but repeats the full flow
+
+**Handoffs (2 calls → 5 total):**
+- The milk agent is **still active** from turn 1 (state persists)
+- No handoff needed—agent directly calls `buy_milk` tool (call 1)
+- Agent responds to user (call 2)
+- **Saves 1 call by skipping the handoff**
+
+**Skills (2 calls → 5 total):**
+- The skill context is **already loaded** in conversation history
+- No need to reload—agent directly calls `buy_milk` tool (call 1)
+- Agent responds to user (call 2)
+- **Saves 1 call by reusing loaded skill**
+
+**Router (3 calls again → 6 total):**
+- Routers are **stateless**—each request requires an LLM routing call
+- Turn 2: Router LLM call (1) → Milk agent calls buy_milk (2) → Milk agent responds (3)
+- Can be optimized by wrapping as a tool in a stateful agent
+
+</Accordion>
+
+**Key insight:** Stateful patterns (Handoffs, Skills) save 40-50% of calls on follow-up requests. Subagents maintain consistent cost per request—this stateless design provides strong context isolation but at the cost of repeated model calls.
+
+### Scenario 3: Multiple domains with large context
+
+> **User:** "Compare Python, JavaScript, and Rust for web development"
+
+Each language skill contains ~2000 tokens of documentation. All patterns can make parallel tool calls.
+
+| Pattern | Model calls | Total tokens | Winner |
+|---------|:-----------:|:------------:|:------:|
+| [**Subagents**](#subagents) | **5** | **~9K** | ⭐ |
+| [**Handoffs**](#handoffs) | **7+** | **~14K+** | |
+| [**Skills**](#skills) | **3** | **~15K** | |
+| [**Router**](#router) | **5** | **~9K** | ⭐ |
+
+<Accordion title="Token and call breakdown">
+
+**Subagents (5 calls, ~9K tokens):**
+```
+Call 1: Main agent (1K tokens)
+  ├─ Calls 3 subagents in parallel
+Call 2: Python subagent (2K tokens) ─┐
+Call 3: JavaScript subagent (2K tokens) ├─ Parallel
+Call 4: Rust subagent (2K tokens) ─────┘
+Call 5: Main synthesizes (2K tokens)
+
+Total: 1K + 2K + 2K + 2K + 2K = 9K tokens
+```
+
+Each subagent works in **isolation** with only its relevant context.
+
+**Handoffs (7+ calls, ~14K+ tokens):**
+```
+Call 1: Main agent handoff to Python (1K)
+Call 2-3: Python agent researches (2-3 calls, ~2K each)
+Call 4: Handoff to JavaScript agent (included in Python's response)
+Call 5-6: JavaScript agent researches (2-3 calls, ~2K each)
+Call 7: Handoff to Rust agent (included in JS's response)
+Call 8-9: Rust agent researches (2-3 calls, ~2K each)
+
+Total: ~14K+ tokens across sequential handoffs
+```
+
+Handoffs executes **sequentially**—can't research all three languages in parallel. Growing conversation history adds overhead.
+
+**Router (5 calls, ~9K tokens):**
+```
+Call 1: Router LLM analyzes query (1K tokens)
+  ├─ Routes to Python, JavaScript, Rust agents
+Call 2: Python agent (2K tokens) ─┐
+Call 3: JavaScript agent (2K tokens) ├─ Parallel
+Call 4: Rust agent (2K tokens) ─────┘
+Call 5: Synthesis LLM combines results (2K tokens)
+
+Total: 1K + 2K + 2K + 2K + 2K = 9K tokens
+```
+
+Router uses an **LLM for routing**, then invokes agents in parallel. Similar to Subagents but with explicit routing step.
+
+**Skills (3 calls, ~15K tokens):**
+```
+Call 1: Load 3 skills (1K tokens)
+  └─ Adds Python (2K) + JavaScript (2K) + Rust (2K) = 6K to context
+
+Call 2: Research (7K tokens)
+  └─ Base (1K) + ALL skill contexts (6K) = 7K total
+
+Call 3: Synthesize (7K tokens)
+  └─ Base (1K) + ALL skill contexts (6K) = 7K total
+
+Total: 1K + 7K + 7K = 15K tokens
+```
+
+After loading, **every subsequent call processes all 6K tokens of skill documentation**.
+
+**The trade-off:**
+- Skills: ✅ Fewer calls (3) → ❌ Higher tokens per call (7K+)
+- Subagents: ❌ More calls (5) → ✅ Lower tokens per call (1-2K)
+- **Result:** Subagents processes 67% fewer tokens overall
+
+</Accordion>
+
+**Key insight:** For multi-domain tasks, patterns with parallel execution (Subagents, Router) are most efficient. Skills has fewer calls but high token usage due to context accumulation. Handoffs is inefficient here—it must execute sequentially and can't leverage parallel tool calling for consulting multiple domains simultaneously.
+
+<Warning>
+**When to avoid Skills**: The Skills pattern is ideal for 1-2 lightweight skills. When you need many skills with extensive documentation (API references, detailed examples, comprehensive guidelines), use **Subagents** or **Router** instead. Context isolation prevents repeatedly processing accumulated documentation.
+</Warning>
+
+### Summary
+
+Here's how patterns compare across all three scenarios:
+
+| Pattern | Single task | Follow-up | Multiple domains | Best for |
+|---------|:-----------:|:---------:|:----------------:|----------|
+| [**Subagents**](#subagents) | 4 calls | 8 calls (4+4) | 5 calls, 9K tokens | Parallel execution, context isolation, distributed teams |
+| [**Handoffs**](#handoffs) | 3 calls | 5 calls (3+2) | 7+ calls, 14K+ tokens | Multi-turn conversations, direct user interaction, sequential workflows |
+| [**Skills**](#skills) | 3 calls | 5 calls (3+2) | 3 calls, 15K tokens | 1-2 lightweight skills, simple context needs |
+| [**Router**](#router) | 3 calls | 6 calls (3+3) | 5 calls, 9K tokens | Parallel execution, distinct verticals, explicit routing logic |
+
+**Choosing a pattern:**
+- **Optimize for single requests?** → Handoffs, Skills, or Router (3 calls each)
+- **Optimize for conversations?** → Handoffs or Skills (stateful, save calls on follow-ups)
+- **Need parallel execution?** → Subagents or Router (invoke multiple agents simultaneously)
+- **Multiple large-context domains?** → Subagents or Router (context isolation prevents bloat)
+- **Simple, focused task?** → Skills (lightweight, minimal overhead)