Skip to content

Commit 45483f3

Browse files
committed
docs: consolidate results, fix voice, address review feedback
1 parent 597a7ce commit 45483f3

1 file changed

Lines changed: 13 additions & 47 deletions

File tree

agents/claude-code/mlflow-tracing.md

Lines changed: 13 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# MLflow Tracing for Claude Code Agent Runtimes on RHOAI
22

3-
We deployed Claude Code as a containerized agent on Red Hat OpenShift AI and wired it up to the MLflow instance running on the same cluster. To validate the full tracing stack, we ran the same prompt — **"build me a tetris game"** — through three different backends: Vertex AI (Google Cloud), vLLM directly, and OGX routing to vLLM. In all three cases, MLflow captured the complete session trace including every tool call, token usage, latency, and the full execution waterfall. The sections below document the telemetry investigation, the tracing prototype, session-level metrics, and the setup guide for productizing this on RHOAI 3.5.
3+
Deploy Claude Code as a containerized agent on Red Hat OpenShift AI and wire it up to the MLflow instance running on the same cluster. To validate the full tracing stack, the same prompt — **"build me a tetris game"** was run through three different backends: Vertex AI (Google Cloud), vLLM directly, and OGX routing to vLLM. In all three cases, MLflow captured the complete session trace including every tool call, token usage, latency, and the full execution waterfall. The sections below document the telemetry investigation, the tracing prototype, session-level metrics, and the setup guide for productizing this on RHOAI 3.5.
44

55
---
66

@@ -49,54 +49,15 @@ This works the same whether the backend is Vertex AI, vLLM directly, or OGX →
4949

5050
The Claude Code stop hook is the right integration path. It already captures everything out of the box — tool calls, token usage, latency, session ID — and works the same across Vertex AI, vLLM, and OGX without any changes. If additional server-side metrics are needed (e.g. per-hop vLLM latency, OGX routing decisions), they can be added directly to the same hook since the infrastructure is already there.
5151

52-
### Evidence: Same Traces Across All Three Backends
53-
54-
We ran **"build me a tetris game"** against all three backends. All three produced the same trace schema.
55-
56-
#### Backend 1: Vertex AI
57-
58-
| Field | Value |
59-
|---|---|
60-
| Model | `claude-sonnet-4-5-20250929` |
61-
| Tokens | 18,504 |
62-
| Latency | 2.90 min |
63-
| Trace ID | `tr-c59dcf7c76c26e4d55255a32694a9bb7` |
64-
65-
![Vertex trace](screenshots/vertex-trace.png)
66-
67-
#### Backend 2: vLLM direct
68-
69-
| Field | Value |
70-
|---|---|
71-
| Model | `gpt-oss-120b` |
72-
| Tokens | 46,211 |
73-
| Latency | 37.82s |
74-
| Trace ID | `tr-39a858c94eb86c3be340e23541717fe8` |
75-
76-
![vLLM trace](screenshots/vllm-trace.png)
77-
78-
#### Backend 3: OGX 1.0.2 → vLLM
79-
80-
| Field | Value |
81-
|---|---|
82-
| Model | `gpt-oss-120b` |
83-
| Tokens | 29,629 |
84-
| Latency | 39.62s |
85-
| Trace ID | `tr-26175953d7cb441e3e2da1cc5fc24607` |
86-
87-
![OGX trace](screenshots/ogx-trace.png)
88-
8952
---
9053

9154
## Tool Call Traces & Agent Execution Metrics
9255

9356
### Summary
9457

95-
**Tool call tracing**We prototyped tool call tracing using `mlflow autolog claude`. Every tool Claude Code calls (Write, Read, Edit, Bash, AskUserQuestion, etc.) is captured as a span in MLflow with the tool name, input parameters, output/result, and latency. Tested across three backends with a real coding task — Vertex AI produced 15 spans, vLLM and OGX produced 8 each. MLflow integration works end-to-end. The stop-hook fires after the session so there is no latency impact.
58+
**Tool call tracing**Using `mlflow autolog claude`, every tool Claude Code calls (Write, Read, Edit, Bash, AskUserQuestion, etc.) is captured as a span in MLflow with the tool name, input parameters, output/result, and latency. Tested across three backends with a real coding task — Vertex AI produced 15 spans, vLLM and OGX produced 8 each. MLflow integration works end-to-end. The stop-hook fires after the session so there is no latency impact.
9659

97-
**Session-level metrics** — On top of the tool call spans, each trace also captures higher-level session metrics: session ID, total duration, input/output token counts, and the full tool call sequence as a waterfall. This answers "what did the agent do and how much did it cost?" for any session. Validated with a complete multi-turn coding task ("build me a tetris game") across all three backends.
98-
99-
As you can see in the results below.
60+
**Session-level metrics** — On top of the tool call spans, each trace also captures higher-level session metrics: session ID, total duration, input/output token counts, and the full tool call sequence as a waterfall. This answers "what did the agent do and how much did it cost?" for any session.
10061

10162
### Trace Schema
10263

@@ -126,9 +87,11 @@ Each span captures: tool name, input parameters, output/result, and per-span lat
12687
| Model ||
12788
| Status ||
12889

129-
### Results: "Build me a Tetris game"
90+
### Results: "Build me a Tetris game" — All Three Backends
91+
92+
Run **"build me a tetris game"** against all three backends. All three produced the same trace schema — prompt, response, token counts, latency, and full tool call sequence.
13093

131-
#### Backend 1: Vertex AI (`claude-sonnet-4-5-20250929`)
94+
#### Vertex AI (`claude-sonnet-4-5-20250929`)
13295

13396
| Metric | Value |
13497
|---|---|
@@ -138,11 +101,12 @@ Each span captures: tool name, input parameters, output/result, and per-span lat
138101
| Spans | 15 |
139102
| Trace ID | `tr-c59dcf7c76c26e4d55255a32694a9bb7` |
140103

104+
![Vertex trace](screenshots/vertex-trace.png)
141105
![Vertex waterfall](screenshots/vertex-summary.png)
142106

143107
---
144108

145-
#### Backend 2: vLLM direct (`gpt-oss-120b`)
109+
#### vLLM direct (`gpt-oss-120b`)
146110

147111
| Metric | Value |
148112
|---|---|
@@ -152,11 +116,12 @@ Each span captures: tool name, input parameters, output/result, and per-span lat
152116
| Spans | 8 |
153117
| Trace ID | `tr-39a858c94eb86c3be340e23541717fe8` |
154118

119+
![vLLM trace](screenshots/vllm-trace.png)
155120
![vLLM waterfall](screenshots/vllm-summary.png)
156121

157122
---
158123

159-
#### Backend 3: OGX 1.0.2 → vLLM (`gpt-oss-120b`)
124+
#### OGX 1.0.2 → vLLM (`gpt-oss-120b`)
160125

161126
| Metric | Value |
162127
|---|---|
@@ -166,6 +131,7 @@ Each span captures: tool name, input parameters, output/result, and per-span lat
166131
| Spans | 8 |
167132
| Trace ID | `tr-26175953d7cb441e3e2da1cc5fc24607` |
168133

134+
![OGX trace](screenshots/ogx-trace.png)
169135
![OGX waterfall](screenshots/ogx-summary.png)
170136

171137
---
@@ -174,7 +140,7 @@ Each span captures: tool name, input parameters, output/result, and per-span lat
174140

175141
### Summary
176142

177-
MLflow integration works. This guide documents how to hook Claude Code, OGX, and MLflow together on RHOAI — assuming all three are already deployed on the cluster. The setup requires the Red Hat MLflow fork for RHOAI 3.4, which will be replaced by upstream MLflow 3.11 in a future release.
143+
MLflow integration works. Follow this guide to hook Claude Code, OGX, and MLflow together on RHOAI — assuming all three are already deployed on the cluster. The setup requires the Red Hat MLflow fork for RHOAI 3.4, which will be replaced by upstream MLflow 3.11 in a future release.
178144

179145
### Prerequisites
180146

0 commit comments

Comments
 (0)