You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/cookbooks/ops-diagnostics.mdx
+95-35Lines changed: 95 additions & 35 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,14 +10,14 @@ This cookbook walks through how we built it—focusing on **environment design**
10
10
11
11
## Why Hierarchical?
12
12
13
-
When you connect multiple MCP servers to a single environment, the agent sees all tools at once. For diagnostics across four services, this meant 60+ tools in a flat list. The cognitive load made it harder for the model to select the right tool for the job.
13
+
When you connect multiple MCP servers to a single environment, the agent sees all tools at once. For diagnostics across six services, this meant 60+ tools in a flat list. The cognitive load made it harder for the model to select the right tool for the job.
14
14
15
15
We restructured into a hierarchy: an orchestrator that delegates to specialized subagents.
16
16
17
17
```mermaid
18
18
flowchart TD
19
19
subgraph orch["Orchestrator"]
20
-
O["4 subagent tools"]
20
+
O["Up to 6 subagent tools"]
21
21
end
22
22
23
23
subgraph sentry["Sentry Agent"]
@@ -44,13 +44,25 @@ flowchart TD
44
44
K3["describe_pod"]
45
45
end
46
46
47
+
subgraph docs["Docs Agent"]
48
+
D1["search_docs"]
49
+
end
50
+
51
+
subgraph github["GitHub Agent"]
52
+
G1["search_code"]
53
+
G2["get_issues"]
54
+
G3["get_workflows"]
55
+
end
56
+
47
57
O --> sentry
48
58
O --> supabase
49
59
O --> railway
50
60
O --> kubectl
61
+
O --> docs
62
+
O --> github
51
63
```
52
64
53
-
The orchestrator sees only 4 tools—one per specialist. Each specialist has a focused toolset for its domain.
65
+
The orchestrator sees only a handful of tools—one per specialist. Each specialist has a focused toolset for its domain. And crucially, **only subagents with valid credentials are registered**.
54
66
55
67
## Environment Design
56
68
@@ -179,52 +191,96 @@ Provide findings, root cause analysis, and recommended fixes."""
179
191
180
192
## Building the Orchestrator
181
193
182
-
The orchestrator wraps each subagent's scenario as an `AgentTool`:
194
+
### Dynamic Subagent Detection
195
+
196
+
A key pattern: **only register subagents for which credentials are present**. This lets you run the same orchestrator code with different configurations—maybe you only have Sentry and Supabase credentials locally, but the full set in production.
183
197
184
198
```python
185
199
# orchestrator.py
186
200
from hud import Environment
187
201
from hud.tools import AgentTool
188
-
from hud.agents import create_agent
189
-
import hud
202
+
import os
190
203
191
-
from environments import sentry_env, supabase_env, railway_env, kubectl_env
The docs subagent connects to any MCP server that provides documentation search. Set `DOCS_MCP` to the URL of your docs MCP server:
244
+
245
+
```python
246
+
# environments/docs.py
247
+
docs_env = Environment(name="docs-agent")
248
+
249
+
docs_mcp_url = os.getenv("DOCS_MCP")
250
+
if docs_mcp_url:
251
+
docs_env.connect_mcp_config({
252
+
"docs": {"url": docs_mcp_url}
253
+
})
254
+
```
255
+
256
+
This makes the orchestrator reusable across different organizations—just point `DOCS_MCP` at your own documentation.
257
+
258
+
### The Scenario
259
+
260
+
The orchestrator wraps each subagent's scenario as an `AgentTool`:
261
+
262
+
```python
263
+
def_format_subagent_list():
264
+
"""Dynamically list available subagents for the prompt."""
265
+
return"\n".join(f"- **{name}**: {desc}"for name, _, desc in _subagents)
266
+
267
+
@orch_env.scenario("diagnose")
268
+
asyncdeforch_diagnose(query: str):
269
+
subagent_list = _format_subagent_list()
211
270
212
-
@orchestrator.scenario("diagnose")
213
-
asyncdefrun_diagnosis(issue: str):
214
-
yieldf"""You are an ops diagnostics orchestrator.
271
+
yieldf"""You are an ops diagnostics orchestrator with specialized subagents:
215
272
216
-
**Issue:** {issue}
273
+
{subagent_list}
274
+
275
+
**Issue to diagnose:** {query}
276
+
277
+
**IMPORTANT: All subagents are READ-ONLY.**
217
278
218
-
You have READ-ONLY subagents for Sentry, Supabase, Railway, and Kubernetes.
219
279
Investigate systematically and correlate findings across services."""
220
-
221
-
task = orchestrator("diagnose", issue=query)
222
-
223
-
asyncwith hud.eval(task) as ctx:
224
-
agent = create_agent(model)
225
-
returnawait agent.run(ctx, max_steps=20)
226
280
```
227
281
282
+
The prompt dynamically lists only the available subagents, so the agent knows exactly what tools it has.
283
+
228
284
### Trace Continuity
229
285
230
286
All subagent activity appears in a single trace on the HUD platform. When the orchestrator calls a subagent tool, the inference and tool calls are recorded under the parent trace—no separate URLs to track.
@@ -471,6 +527,10 @@ The entire investigation—from initial query to actionable recommendations—to
471
527
472
528
3.**Custom tools fill gaps.** When MCP servers don't fit your auth model, build direct API integrations.
473
529
530
+
4.**Dynamic detection enables flexibility.** Only registering subagents with valid credentials means the same code works across different environments—dev, staging, production—with different service access.
531
+
532
+
5.**Configurable integrations improve reusability.** Making things like `DOCS_MCP` configurable via env vars lets others use your orchestrator with their own services.
0 commit comments