You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: agents/claude-code/mlflow-tracing.md
+13-47Lines changed: 13 additions & 47 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# MLflow Tracing for Claude Code Agent Runtimes on RHOAI
2
2
3
-
We deployed Claude Code as a containerized agent on Red Hat OpenShift AI and wired it up to the MLflow instance running on the same cluster. To validate the full tracing stack, we ran the same prompt — **"build me a tetris game"** — through three different backends: Vertex AI (Google Cloud), vLLM directly, and OGX routing to vLLM. In all three cases, MLflow captured the complete session trace including every tool call, token usage, latency, and the full execution waterfall. The sections below document the telemetry investigation, the tracing prototype, session-level metrics, and the setup guide for productizing this on RHOAI 3.5.
3
+
Deploy Claude Code as a containerized agent on Red Hat OpenShift AI and wire it up to the MLflow instance running on the same cluster. To validate the full tracing stack, the same prompt — **"build me a tetris game"** — was run through three different backends: Vertex AI (Google Cloud), vLLM directly, and OGX routing to vLLM. In all three cases, MLflow captured the complete session trace including every tool call, token usage, latency, and the full execution waterfall. The sections below document the telemetry investigation, the tracing prototype, session-level metrics, and the setup guide for productizing this on RHOAI 3.5.
4
4
5
5
---
6
6
@@ -49,54 +49,15 @@ This works the same whether the backend is Vertex AI, vLLM directly, or OGX →
49
49
50
50
The Claude Code stop hook is the right integration path. It already captures everything out of the box — tool calls, token usage, latency, session ID — and works the same across Vertex AI, vLLM, and OGX without any changes. If additional server-side metrics are needed (e.g. per-hop vLLM latency, OGX routing decisions), they can be added directly to the same hook since the infrastructure is already there.
51
51
52
-
### Evidence: Same Traces Across All Three Backends
53
-
54
-
We ran **"build me a tetris game"** against all three backends. All three produced the same trace schema.
55
-
56
-
#### Backend 1: Vertex AI
57
-
58
-
| Field | Value |
59
-
|---|---|
60
-
| Model |`claude-sonnet-4-5-20250929`|
61
-
| Tokens | 18,504 |
62
-
| Latency | 2.90 min |
63
-
| Trace ID |`tr-c59dcf7c76c26e4d55255a32694a9bb7`|
64
-
65
-

66
-
67
-
#### Backend 2: vLLM direct
68
-
69
-
| Field | Value |
70
-
|---|---|
71
-
| Model |`gpt-oss-120b`|
72
-
| Tokens | 46,211 |
73
-
| Latency | 37.82s |
74
-
| Trace ID |`tr-39a858c94eb86c3be340e23541717fe8`|
75
-
76
-

77
-
78
-
#### Backend 3: OGX 1.0.2 → vLLM
79
-
80
-
| Field | Value |
81
-
|---|---|
82
-
| Model |`gpt-oss-120b`|
83
-
| Tokens | 29,629 |
84
-
| Latency | 39.62s |
85
-
| Trace ID |`tr-26175953d7cb441e3e2da1cc5fc24607`|
86
-
87
-

88
-
89
52
---
90
53
91
54
## Tool Call Traces & Agent Execution Metrics
92
55
93
56
### Summary
94
57
95
-
**Tool call tracing** — We prototyped tool call tracing using `mlflow autolog claude`. Every tool Claude Code calls (Write, Read, Edit, Bash, AskUserQuestion, etc.) is captured as a span in MLflow with the tool name, input parameters, output/result, and latency. Tested across three backends with a real coding task — Vertex AI produced 15 spans, vLLM and OGX produced 8 each. MLflow integration works end-to-end. The stop-hook fires after the session so there is no latency impact.
58
+
**Tool call tracing** — Using `mlflow autolog claude`, every tool Claude Code calls (Write, Read, Edit, Bash, AskUserQuestion, etc.) is captured as a span in MLflow with the tool name, input parameters, output/result, and latency. Tested across three backends with a real coding task — Vertex AI produced 15 spans, vLLM and OGX produced 8 each. MLflow integration works end-to-end. The stop-hook fires after the session so there is no latency impact.
96
59
97
-
**Session-level metrics** — On top of the tool call spans, each trace also captures higher-level session metrics: session ID, total duration, input/output token counts, and the full tool call sequence as a waterfall. This answers "what did the agent do and how much did it cost?" for any session. Validated with a complete multi-turn coding task ("build me a tetris game") across all three backends.
98
-
99
-
As you can see in the results below.
60
+
**Session-level metrics** — On top of the tool call spans, each trace also captures higher-level session metrics: session ID, total duration, input/output token counts, and the full tool call sequence as a waterfall. This answers "what did the agent do and how much did it cost?" for any session.
100
61
101
62
### Trace Schema
102
63
@@ -126,9 +87,11 @@ Each span captures: tool name, input parameters, output/result, and per-span lat
126
87
| Model | ✅ |
127
88
| Status | ✅ |
128
89
129
-
### Results: "Build me a Tetris game"
90
+
### Results: "Build me a Tetris game" — All Three Backends
91
+
92
+
Run **"build me a tetris game"** against all three backends. All three produced the same trace schema — prompt, response, token counts, latency, and full tool call sequence.
130
93
131
-
#### Backend 1: Vertex AI (`claude-sonnet-4-5-20250929`)
94
+
#### Vertex AI (`claude-sonnet-4-5-20250929`)
132
95
133
96
| Metric | Value |
134
97
|---|---|
@@ -138,11 +101,12 @@ Each span captures: tool name, input parameters, output/result, and per-span lat
138
101
| Spans | 15 |
139
102
| Trace ID |`tr-c59dcf7c76c26e4d55255a32694a9bb7`|
@@ -152,11 +116,12 @@ Each span captures: tool name, input parameters, output/result, and per-span lat
152
116
| Spans | 8 |
153
117
| Trace ID |`tr-39a858c94eb86c3be340e23541717fe8`|
154
118
119
+

155
120

156
121
157
122
---
158
123
159
-
#### Backend 3: OGX 1.0.2 → vLLM (`gpt-oss-120b`)
124
+
#### OGX 1.0.2 → vLLM (`gpt-oss-120b`)
160
125
161
126
| Metric | Value |
162
127
|---|---|
@@ -166,6 +131,7 @@ Each span captures: tool name, input parameters, output/result, and per-span lat
166
131
| Spans | 8 |
167
132
| Trace ID |`tr-26175953d7cb441e3e2da1cc5fc24607`|
168
133
134
+

169
135

170
136
171
137
---
@@ -174,7 +140,7 @@ Each span captures: tool name, input parameters, output/result, and per-span lat
174
140
175
141
### Summary
176
142
177
-
MLflow integration works. This guide documents how to hook Claude Code, OGX, and MLflow together on RHOAI — assuming all three are already deployed on the cluster. The setup requires the Red Hat MLflow fork for RHOAI 3.4, which will be replaced by upstream MLflow 3.11 in a future release.
143
+
MLflow integration works. Follow this guide to hook Claude Code, OGX, and MLflow together on RHOAI — assuming all three are already deployed on the cluster. The setup requires the Red Hat MLflow fork for RHOAI 3.4, which will be replaced by upstream MLflow 3.11 in a future release.
0 commit comments