Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
252 changes: 252 additions & 0 deletions src/oss/langchain/multi-agent.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -1733,3 +1733,255 @@ console.log(result.answer);
:::

</Accordion>

## Performance comparison

Understanding how different patterns affect performance helps you optimize for latency and cost. We'll compare patterns across three scenarios, measuring:
- **Model calls**: Number of LLM invocations (each time the model is called to generate a response or tool call)
- **Tokens processed**: Total context window usage across all calls

### Scenario 1: Single task

> **User:** "Buy milk"

There's a specialized milk agent/skill that can call a `buy_milk` tool.

| Pattern | Model calls | Winner |
|---------|:-----------:|:------:|
| [**Subagents**](#subagents) | **4** | |
| [**Handoffs**](#handoffs) | **3** | ⭐ |
| [**Skills**](#skills) | **3** | ⭐ |
| [**Router**](#router) | **3** | ⭐ |

<Tabs>
<Tab title="Subagents">
**4 model calls:**
```mermaid
sequenceDiagram
participant User
participant Main Agent
participant Milk Subagent
participant buy_milk tool

User->>Main Agent: "Buy milk"
Note over Main Agent: Call 1
Main Agent->>Milk Subagent: milk_subagent()
Note over Milk Subagent: Call 2
Milk Subagent->>buy_milk tool: buy_milk()
buy_milk tool-->>Milk Subagent: Done
Note over Milk Subagent: Call 3
Milk Subagent-->>Main Agent: "Bought milk"
Note over Main Agent: Call 4
Main Agent-->>User: "I bought milk for you"
```
</Tab>

<Tab title="Handoffs">
**3 model calls:**
```mermaid
sequenceDiagram
participant User
participant Main Agent
participant Milk Agent
participant buy_milk tool

User->>Main Agent: "Buy milk"
Note over Main Agent: Call 1
Main Agent->>Milk Agent: transfer_to_milk_agent()
Note over Milk Agent: Call 2
Milk Agent->>buy_milk tool: buy_milk()
buy_milk tool-->>Milk Agent: Done
Note over Milk Agent: Call 3
Milk Agent-->>User: "I bought milk for you"
```
</Tab>

<Tab title="Skills">
**3 model calls:**
```mermaid
sequenceDiagram
participant User
participant Agent
participant load_skill tool
participant buy_milk tool

User->>Agent: "Buy milk"
Note over Agent: Call 1
Agent->>load_skill tool: load_skill("milk")
load_skill tool-->>Agent: Milk skill context
Note over Agent: Call 2
Agent->>buy_milk tool: buy_milk()
buy_milk tool-->>Agent: Done
Note over Agent: Call 3
Agent-->>User: "I bought milk for you"
```
</Tab>

<Tab title="Router">
**3 model calls:**
```mermaid
sequenceDiagram
participant User
participant Router LLM
participant Milk Agent
participant buy_milk tool

User->>Router LLM: "Buy milk"
Note over Router LLM: Call 1: Route to milk agent
Router LLM->>Milk Agent: Invoke with query
Note over Milk Agent: Call 2
Milk Agent->>buy_milk tool: buy_milk()
buy_milk tool-->>Milk Agent: Done
Note over Milk Agent: Call 3
Milk Agent-->>User: "I bought milk for you"
```
</Tab>
</Tabs>

**Key insight:** Handoffs, Skills, and Router are most efficient for single tasks (3 calls each). Subagents adds one extra call because results flow back through the main agent—this overhead provides centralized control.

### Scenario 2: Follow-up request

> **Turn 1:** "Buy milk"
> **Turn 2:** "Buy milk again"

The user makes a follow-up request in the same conversation.

| Pattern | Turn 2 calls | Total (both turns) | Winner |
|---------|:------------:|:------------------:|:------:|
| [**Subagents**](#subagents) | **4** | **8** | |
| [**Handoffs**](#handoffs) | **2** | **5** | ⭐ |
| [**Skills**](#skills) | **2** | **5** | ⭐ |
| [**Router**](#router) | **3** | **6** | |

<Accordion title="Why the difference?">

**Subagents (4 calls again → 8 total):**
- Subagents are **stateless by design**—each invocation follows the same flow
- The main agent maintains conversation context, but subagents start fresh each time
- This provides strong context isolation but repeats the full flow

**Handoffs (2 calls → 5 total):**
- The milk agent is **still active** from turn 1 (state persists)
- No handoff needed—agent directly calls `buy_milk` tool (call 1)
- Agent responds to user (call 2)
- **Saves 1 call by skipping the handoff**

**Skills (2 calls → 5 total):**
- The skill context is **already loaded** in conversation history
- No need to reload—agent directly calls `buy_milk` tool (call 1)
- Agent responds to user (call 2)
- **Saves 1 call by reusing loaded skill**

**Router (3 calls again → 6 total):**
- Routers are **stateless**—each request requires an LLM routing call
- Turn 2: Router LLM call (1) → Milk agent calls buy_milk (2) → Milk agent responds (3)
- Can be optimized by wrapping as a tool in a stateful agent

</Accordion>

**Key insight:** Stateful patterns (Handoffs, Skills) save 40-50% of calls on follow-up requests. Subagents maintain consistent cost per request—this stateless design provides strong context isolation but at the cost of repeated model calls.

### Scenario 3: Multiple domains with large context

> **User:** "Compare Python, JavaScript, and Rust for web development"

Each language skill contains ~2000 tokens of documentation. All patterns can make parallel tool calls.

| Pattern | Model calls | Total tokens | Winner |
|---------|:-----------:|:------------:|:------:|
| [**Subagents**](#subagents) | **5** | **~9K** | ⭐ |
| [**Handoffs**](#handoffs) | **7+** | **~14K+** | |
| [**Skills**](#skills) | **3** | **~15K** | |
| [**Router**](#router) | **5** | **~9K** | ⭐ |

<Accordion title="Token and call breakdown">

**Subagents (5 calls, ~9K tokens):**
```
Call 1: Main agent (1K tokens)
├─ Calls 3 subagents in parallel
Call 2: Python subagent (2K tokens) ─┐
Call 3: JavaScript subagent (2K tokens) ├─ Parallel
Call 4: Rust subagent (2K tokens) ─────┘
Call 5: Main synthesizes (2K tokens)

Total: 1K + 2K + 2K + 2K + 2K = 9K tokens
```

Each subagent works in **isolation** with only its relevant context.

**Handoffs (7+ calls, ~14K+ tokens):**
```
Call 1: Main agent handoff to Python (1K)
Call 2-3: Python agent researches (2-3 calls, ~2K each)
Call 4: Handoff to JavaScript agent (included in Python's response)
Call 5-6: JavaScript agent researches (2-3 calls, ~2K each)
Call 7: Handoff to Rust agent (included in JS's response)
Call 8-9: Rust agent researches (2-3 calls, ~2K each)

Total: ~14K+ tokens across sequential handoffs
```

Handoffs executes **sequentially**—can't research all three languages in parallel. Growing conversation history adds overhead.

**Router (5 calls, ~9K tokens):**
```
Call 1: Router LLM analyzes query (1K tokens)
├─ Routes to Python, JavaScript, Rust agents
Call 2: Python agent (2K tokens) ─┐
Call 3: JavaScript agent (2K tokens) ├─ Parallel
Call 4: Rust agent (2K tokens) ─────┘
Call 5: Synthesis LLM combines results (2K tokens)

Total: 1K + 2K + 2K + 2K + 2K = 9K tokens
```

Router uses an **LLM for routing**, then invokes agents in parallel. Similar to Subagents but with explicit routing step.

**Skills (3 calls, ~15K tokens):**
```
Call 1: Load 3 skills (1K tokens)
└─ Adds Python (2K) + JavaScript (2K) + Rust (2K) = 6K to context

Call 2: Research (7K tokens)
└─ Base (1K) + ALL skill contexts (6K) = 7K total

Call 3: Synthesize (7K tokens)
└─ Base (1K) + ALL skill contexts (6K) = 7K total

Total: 1K + 7K + 7K = 15K tokens
```

After loading, **every subsequent call processes all 6K tokens of skill documentation**.

**The trade-off:**
- Skills: ✅ Fewer calls (3) → ❌ Higher tokens per call (7K+)
- Subagents: ❌ More calls (5) → ✅ Lower tokens per call (1-2K)
- **Result:** Subagents processes 67% fewer tokens overall

</Accordion>

**Key insight:** For multi-domain tasks, patterns with parallel execution (Subagents, Router) are most efficient. Skills has fewer calls but high token usage due to context accumulation. Handoffs is inefficient here—it must execute sequentially and can't leverage parallel tool calling for consulting multiple domains simultaneously.

<Warning>
**When to avoid Skills**: The Skills pattern is ideal for 1-2 lightweight skills. When you need many skills with extensive documentation (API references, detailed examples, comprehensive guidelines), use **Subagents** or **Router** instead. Context isolation prevents repeatedly processing accumulated documentation.
</Warning>

### Summary

Here's how patterns compare across all three scenarios:

| Pattern | Single task | Follow-up | Multiple domains | Best for |
|---------|:-----------:|:---------:|:----------------:|----------|
| [**Subagents**](#subagents) | 4 calls | 8 calls (4+4) | 5 calls, 9K tokens | Parallel execution, context isolation, distributed teams |
| [**Handoffs**](#handoffs) | 3 calls | 5 calls (3+2) | 7+ calls, 14K+ tokens | Multi-turn conversations, direct user interaction, sequential workflows |
| [**Skills**](#skills) | 3 calls | 5 calls (3+2) | 3 calls, 15K tokens | 1-2 lightweight skills, simple context needs |
| [**Router**](#router) | 3 calls | 6 calls (3+3) | 5 calls, 9K tokens | Parallel execution, distinct verticals, explicit routing logic |

**Choosing a pattern:**
- **Optimize for single requests?** → Handoffs, Skills, or Router (3 calls each)
- **Optimize for conversations?** → Handoffs or Skills (stateful, save calls on follow-ups)
- **Need parallel execution?** → Subagents or Router (invoke multiple agents simultaneously)
- **Multiple large-context domains?** → Subagents or Router (context isolation prevents bloat)
- **Simple, focused task?** → Skills (lightweight, minimal overhead)