Skip to content

Commit d4781b9

Browse files
Nehanthclaude
andcommitted
docs: add MLflow tracing guide for Claude Code on RHOAI
Adds documentation covering MLflow autolog integration with Claude Code across Vertex AI, vLLM, and OGX backends. Includes trace schema, session-level metrics, setup guide, and RHOAI 3.5 recommendation. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
1 parent 237a0b5 commit d4781b9

6 files changed

Lines changed: 280 additions & 0 deletions

File tree

Lines changed: 280 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,280 @@
1+
# MLflow Tracing for Claude Code Agent Runtimes on RHOAI
2+
3+
We deployed Claude Code as a containerized agent on Red Hat OpenShift AI and wired it up to the MLflow instance running on the same cluster. To validate the full tracing stack, we ran the same prompt — **"build me a tetris game"** — through three different backends: Vertex AI (Google Cloud), vLLM directly, and OGX routing to vLLM. In all three cases, MLflow captured the complete session trace including every tool call, token usage, latency, and the full execution waterfall. The sections below document the telemetry investigation, the tracing prototype, session-level metrics, and the setup guide for productizing this on RHOAI 3.5.
4+
5+
---
6+
7+
## RHAIENG-4751 — Inventory OGX Telemetry Hooks and MLflow Integration Points
8+
9+
### Summary
10+
11+
Agent-level instrumentation via `mlflow autolog claude` works out of the box with any backend. Swapping Vertex AI for vLLM or OGX produces the same trace schema with no changes to the tracing setup. If server-side OGX OTel spans are needed in future, they would be added to the Claude Code stop hook.
12+
13+
### OGX Telemetry Capabilities
14+
15+
OGX 1.0.2 emits structured logs per request:
16+
17+
```
18+
INFO Using native /v1/messages passthrough
19+
base_url=http://vllm-120b-predictor.gpt-oss.svc.cluster.local
20+
model=vllm/gpt-oss-120b
21+
HTTP 200
22+
```
23+
24+
| Signal | Available |
25+
|---|---|
26+
| Model name ||
27+
| Backend / provider URL ||
28+
| Passthrough status ||
29+
| HTTP status code ||
30+
| Per-request latency ||
31+
32+
### Agent-side OTel Spans (what we capture today)
33+
34+
The spans produced by `mlflow autolog claude` are OTel spans. Every session captures:
35+
36+
| Field | Example |
37+
|---|---|
38+
| Token count | 29,629 (input + output) |
39+
| Session latency | 39.62s |
40+
| Tool call sequence | tool_AskUserQuestion → llm → tool_Write → tool_Read → ... |
41+
| Prompt / response | Full input and output text |
42+
| Session ID | Links multi-turn conversations |
43+
| Model | `gpt-oss-120b`, `claude-sonnet-4-5-20250929`, etc. |
44+
| Status | OK / error |
45+
46+
This works the same whether the backend is Vertex AI, vLLM directly, or OGX → vLLM. If server-side OGX spans are needed in future, they would be added to the same Claude Code stop hook.
47+
48+
### Integration Path
49+
50+
| Path | Status |
51+
|---|---|
52+
| **Agent-level → MLflow** (`mlflow autolog claude`) | ✅ Working — tested across all three backends |
53+
| OGX → custom exporter → MLflow | 🔲 Future — would add server-side signals on top |
54+
| OGX → OTel → MLflow | 🔲 Future — requires OTel support in both OGX and MLflow |
55+
56+
### Evidence: Same Traces Across All Three Backends
57+
58+
We ran **"build me a tetris game"** against all three backends. All three produced the same trace schema.
59+
60+
#### Backend 1: Vertex AI
61+
62+
| Field | Value |
63+
|---|---|
64+
| Model | `claude-sonnet-4-5-20250929` |
65+
| Tokens | 18,504 |
66+
| Latency | 2.90 min |
67+
| Trace ID | `tr-c59dcf7c76c26e4d55255a32694a9bb7` |
68+
69+
![Vertex trace](screenshots/vertex-trace.png)
70+
71+
#### Backend 2: vLLM direct
72+
73+
| Field | Value |
74+
|---|---|
75+
| Model | `gpt-oss-120b` |
76+
| Tokens | 46,211 |
77+
| Latency | 37.82s |
78+
| Trace ID | `tr-39a858c94eb86c3be340e23541717fe8` |
79+
80+
![vLLM trace](screenshots/vllm-trace.png)
81+
82+
#### Backend 3: OGX 1.0.2 → vLLM
83+
84+
| Field | Value |
85+
|---|---|
86+
| Model | `gpt-oss-120b` |
87+
| Tokens | 29,629 |
88+
| Latency | 39.62s |
89+
| Trace ID | `tr-26175953d7cb441e3e2da1cc5fc24607` |
90+
91+
![OGX trace](screenshots/ogx-trace.png)
92+
93+
---
94+
95+
## RHAIENG-4752 & RHAIENG-4753 — Tool Call Traces & Agent Execution Metrics
96+
97+
### Summary
98+
99+
**RHAIENG-4752** — We prototyped tool call tracing using `mlflow autolog claude`. Every tool Claude Code calls (Write, Read, Edit, Bash, AskUserQuestion, etc.) is captured as a span in MLflow with the tool name, input parameters, output/result, and latency. Tested across three backends with a real coding task — Vertex AI produced 15 spans, vLLM and OGX produced 8 each. MLflow integration works end-to-end. The stop-hook fires after the session so there is no latency impact.
100+
101+
**RHAIENG-4753** — On top of the tool call spans, each trace also captures higher-level session metrics: session ID, total duration, input/output token counts, and the full tool call sequence as a waterfall. This answers "what did the agent do and how much did it cost?" for any session. Validated with a complete multi-turn coding task ("build me a tetris game") across all three backends.
102+
103+
As you can see in the results below.
104+
105+
### Trace Schema
106+
107+
```
108+
claude_code_conversation (root)
109+
├── tool_AskUserQuestion — question asked + user answer
110+
├── tool_EnterPlanMode — agent enters planning
111+
├── llm — LLM inference call
112+
├── tool_Bash — command + output
113+
├── tool_Write — file path + content written
114+
├── tool_Read — file path + content read
115+
├── tool_Edit — file path + diff applied
116+
├── tool_ExitPlanMode — exits planning
117+
└── llm — final response
118+
```
119+
120+
Each span captures: tool name, input parameters, output/result, and per-span latency. Session-level fields on every trace:
121+
122+
| Field | Captured |
123+
|---|---|
124+
| Session ID ||
125+
| Total duration ||
126+
| Input tokens ||
127+
| Output tokens ||
128+
| Total tokens ||
129+
| Tool call sequence (waterfall) ||
130+
| Model ||
131+
| Status ||
132+
133+
### Results: "Build me a Tetris game"
134+
135+
#### Backend 1: Vertex AI (`claude-sonnet-4-5-20250929`)
136+
137+
| Metric | Value |
138+
|---|---|
139+
| Session ID | `b679dc2c-...` |
140+
| Tokens | 18,504 |
141+
| Latency | 2.90 min |
142+
| Spans | 15 |
143+
| Trace ID | `tr-c59dcf7c76c26e4d55255a32694a9bb7` |
144+
145+
![Vertex waterfall](screenshots/vertex-summary.png)
146+
147+
---
148+
149+
#### Backend 2: vLLM direct (`gpt-oss-120b`)
150+
151+
| Metric | Value |
152+
|---|---|
153+
| Session ID | `cc76b223-...` |
154+
| Tokens | 46,211 |
155+
| Latency | 37.82s |
156+
| Spans | 8 |
157+
| Trace ID | `tr-39a858c94eb86c3be340e23541717fe8` |
158+
159+
![vLLM waterfall](screenshots/vllm-summary.png)
160+
161+
---
162+
163+
#### Backend 3: OGX 1.0.2 → vLLM (`gpt-oss-120b`)
164+
165+
| Metric | Value |
166+
|---|---|
167+
| Session ID | `980fbcb8-...` |
168+
| Tokens | 29,629 |
169+
| Latency | 39.62s |
170+
| Spans | 8 |
171+
| Trace ID | `tr-26175953d7cb441e3e2da1cc5fc24607` |
172+
173+
![OGX waterfall](screenshots/ogx-summary.png)
174+
175+
---
176+
177+
## RHAIENG-4754 — Observability Setup Guide & RHOAI 3.5 Recommendation
178+
179+
### Summary
180+
181+
MLflow integration works. This guide documents how to hook Claude Code, OGX, and MLflow together on RHOAI — assuming all three are already deployed on the cluster. The setup requires the Red Hat MLflow fork for RHOAI 3.4, which will be replaced by upstream MLflow 3.11 in a future release.
182+
183+
### Prerequisites
184+
185+
The following must already be running on the cluster:
186+
187+
- Claude Code container deployed (see [PR #92](https://github.com/red-hat-data-services/agentic-starter-kits/pull/92))
188+
- OGX deployed and serving a model
189+
- MLflow instance running via the ODH/RHOAI operator with a workspace matching your namespace
190+
191+
### Step-by-Step Setup
192+
193+
#### 1. Add Python + MLflow to the Containerfile
194+
195+
The ODH build of MLflow uses the Red Hat fork which includes the `kubernetes-namespaced` auth plugin not yet in upstream 3.10.x:
196+
197+
```dockerfile
198+
RUN microdnf install -y python3.12 python3.12-pip
199+
RUN python3.12 -m pip install --no-cache-dir \
200+
'mlflow[kubernetes] @ git+https://github.com/red-hat-data-services/mlflow.git@rhoai-3.4'
201+
```
202+
203+
> This fork requirement will go away when RHOAI ships MLflow 3.11, at which point replace with `mlflow[kubernetes]>=3.11`.
204+
205+
#### 2. Grant RBAC to the pod's service account
206+
207+
```bash
208+
oc adm policy add-role-to-user edit -z default -n <your-namespace>
209+
```
210+
211+
#### 3. Add MLflow env vars to the deployment
212+
213+
```yaml
214+
- name: MLFLOW_TRACKING_URI
215+
value: "https://mlflow.redhat-ods-applications.svc:8443"
216+
- name: MLFLOW_TRACKING_AUTH
217+
value: "kubernetes-namespaced"
218+
- name: MLFLOW_WORKSPACE
219+
value: "<your-namespace>"
220+
- name: MLFLOW_EXPERIMENT_NAME
221+
value: "claude-code-traces"
222+
- name: MLFLOW_TRACKING_INSECURE_TLS
223+
value: "true"
224+
```
225+
226+
#### 4. Add OGX env vars to point Claude Code at OGX
227+
228+
```yaml
229+
- name: ANTHROPIC_BASE_URL
230+
value: "https://<your-ogx-route>"
231+
- name: ANTHROPIC_API_KEY
232+
value: "fake"
233+
- name: ANTHROPIC_CUSTOM_MODEL_OPTION
234+
value: "vllm/<your-model-name>"
235+
```
236+
237+
#### 5. Wire up autolog in the entrypoint
238+
239+
The entrypoint runs `mlflow autolog claude` at startup and injects auth into the generated `.claude/settings.json`:
240+
241+
```bash
242+
mlflow autolog claude -u "${MLFLOW_TRACKING_URI}" -n "${MLFLOW_EXPERIMENT_NAME}" /workspace
243+
244+
python3.12 -c '
245+
import json, os
246+
sf = "/workspace/.claude/settings.json"
247+
with open(sf) as f: s = json.load(f)
248+
env = s.setdefault("env", {})
249+
env["MLFLOW_TRACKING_AUTH"] = "kubernetes-namespaced"
250+
env["MLFLOW_WORKSPACE"] = os.environ["MLFLOW_WORKSPACE"]
251+
env["MLFLOW_TRACKING_INSECURE_TLS"] = "true"
252+
with open(sf, "w") as f: json.dump(s, f, indent=2)
253+
'
254+
```
255+
256+
#### 6. Verify
257+
258+
```bash
259+
# Check startup logs
260+
oc logs deployment/<claude-deployment> | grep -i mlflow
261+
262+
# Run a test
263+
oc exec deployment/<claude-deployment> -- bash -c '
264+
export HOME=/home/claude-agent && cd /workspace
265+
~/.claude/claude-run -p "What is 2+2?"
266+
'
267+
268+
# Confirm trace was created
269+
oc exec deployment/<claude-deployment> -- \
270+
tail -3 /workspace/.claude/mlflow/claude_tracing.log
271+
# Expected: "Created MLflow trace: tr-..."
272+
```
273+
274+
### Recommendation for RHOAI 3.5
275+
276+
**Productize `mlflow autolog claude` as the agent tracing path.**
277+
278+
It works across all backends (Vertex AI, vLLM, OGX) with no changes to the tracing setup. It captures tool calls, token usage, latency, and session metadata out of the box. The only overhead is the stop-hook which runs after the session ends — zero impact on agent response times.
279+
280+
When RHOAI ships MLflow 3.11, drop the Red Hat fork and use upstream `mlflow[kubernetes]>=3.11`.
390 KB
Loading
290 KB
Loading
428 KB
Loading
359 KB
Loading
384 KB
Loading

0 commit comments

Comments
 (0)