Skip to content

Commit 10e0129

Browse files
authored
Merge pull request #105 from red-hat-data-services/mlflow-tracing-docs
2 parents 2c9d80c + 05f2e2f commit 10e0129

7 files changed

Lines changed: 241 additions & 0 deletions

File tree

Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
# MLflow Tracing for Claude Code Agent Runtimes on RHOAI
2+
3+
Deploy Claude Code as a containerized agent on Red Hat OpenShift AI and wire it up to the MLflow instance running on the same cluster. To validate the full tracing stack, the same prompt — **"build me a tetris game"** — was run through three different backends: Vertex AI (Google Cloud), vLLM directly, and OGX routing to vLLM. In all three cases, MLflow captured the complete session trace including every tool call, token usage, latency, and the full execution waterfall. The sections below document the telemetry investigation, the tracing prototype, session-level metrics, and the setup guide for productizing this on RHOAI 3.5.
4+
5+
---
6+
7+
## Inventory OGX Telemetry Hooks and MLflow Integration Points
8+
9+
### Summary
10+
11+
Agent-level instrumentation via `mlflow autolog claude` works out of the box with any backend. Swapping Vertex AI for vLLM or OGX produces the same trace schema with no changes to the tracing setup. If server-side metrics are needed in future (e.g. per-hop latency, routing decisions), those would come from OGX or vLLM emitting their own OTel spans — the Claude Code hook only captures local agent-side data from the Claude Code session file.
12+
13+
### OGX Telemetry Capabilities
14+
15+
OGX 1.0.2 emits structured logs per request:
16+
17+
```text
18+
INFO Using native /v1/messages passthrough
19+
base_url=http://vllm-120b-predictor.gpt-oss.svc.cluster.local
20+
model=vllm/gpt-oss-120b
21+
HTTP 200
22+
```
23+
24+
| Signal | Available |
25+
|---|---|
26+
| Model name ||
27+
| Backend / provider URL ||
28+
| Passthrough status ||
29+
| HTTP status code ||
30+
| Per-request latency ||
31+
32+
### Agent-side OTel Spans (what we capture today)
33+
34+
The spans produced by `mlflow autolog claude` are OTel spans. Every session captures:
35+
36+
| Field | Example |
37+
|---|---|
38+
| Token count | 29,629 (input + output) |
39+
| Session latency | 39.62s |
40+
| Tool call sequence | tool_AskUserQuestion → llm → tool_Write → tool_Read → ... |
41+
| Prompt / response | Full input and output text |
42+
| Session ID | Links multi-turn conversations |
43+
| Model | `gpt-oss-120b`, `claude-sonnet-4-5-20250929`, etc. |
44+
| Status | OK / error |
45+
46+
This works the same whether the backend is Vertex AI, vLLM directly, or OGX → vLLM, using the same Claude Code hook that emits these OTel spans. If server-side OGX spans are needed in the future, they would need to be emitted using a custom exporter.
47+
48+
### Integration Path
49+
50+
The Claude Code stop hook is the right integration path for agent-level tracing. It captures tool calls, token usage, latency, and session ID out of the box because Claude Code records and writes all of this in its session file — and works the same across Vertex AI, vLLM, and OGX without any changes. If additional server-side metrics are needed (e.g. per-hop vLLM latency, OGX routing decisions), those would require OGX or vLLM to emit their own OTel spans separately.
51+
52+
---
53+
54+
## Tool Call Traces & Agent Execution Metrics
55+
56+
### Summary
57+
58+
**Tool call tracing** — Using `mlflow autolog claude`, every tool Claude Code calls (Write, Read, Edit, Bash, AskUserQuestion, etc.) is captured as a span in MLflow with the tool name, input parameters, output/result, and latency. Tested across three backends with a real coding task — Vertex AI produced 15 spans, vLLM and OGX produced 8 each. MLflow integration works end-to-end. The stop-hook fires after the session so there is no latency impact.
59+
60+
**Session-level metrics** — On top of the tool call spans, each trace also captures higher-level session metrics: session ID, total duration, input/output token counts, and the full tool call sequence as a waterfall. This answers "what did the agent do and how much did it cost?" for any session.
61+
62+
### Trace Schema
63+
64+
```text
65+
claude_code_conversation (root)
66+
├── tool_AskUserQuestion — question asked + user answer
67+
├── tool_EnterPlanMode — agent enters planning
68+
├── llm — LLM inference call
69+
├── tool_Bash — command + output
70+
├── tool_Write — file path + content written
71+
├── tool_Read — file path + content read
72+
├── tool_Edit — file path + diff applied
73+
├── tool_ExitPlanMode — exits planning
74+
└── llm — final response
75+
```
76+
77+
Each span captures: tool name, input parameters, output/result, and per-span latency. Session-level fields on every trace:
78+
79+
| Field | Captured |
80+
|---|---|
81+
| Session ID ||
82+
| Total duration ||
83+
| Input tokens ||
84+
| Output tokens ||
85+
| Total tokens ||
86+
| Tool call sequence (waterfall) ||
87+
| Model ||
88+
| Status ||
89+
90+
### Results: "Build me a Tetris game" — All Three Backends
91+
92+
Run **"build me a tetris game"** against all three backends. All three produced the same trace schema — prompt, response, token counts, latency, and full tool call sequence.
93+
94+
#### Vertex AI (`claude-sonnet-4-5-20250929`)
95+
96+
| Metric | Value |
97+
|---|---|
98+
| Session ID | `b679dc2c-...` |
99+
| Tokens | 18,504 |
100+
| Latency | 2.90 min |
101+
| Spans | 15 |
102+
| Trace ID | `tr-c59dcf7c76c26e4d55255a32694a9bb7` |
103+
104+
![Vertex trace](screenshots/vertex-trace.png)
105+
![Vertex waterfall](screenshots/vertex-summary.png)
106+
107+
---
108+
109+
#### vLLM direct (`gpt-oss-120b`)
110+
111+
| Metric | Value |
112+
|---|---|
113+
| Session ID | `cc76b223-...` |
114+
| Tokens | 46,211 |
115+
| Latency | 37.82s |
116+
| Spans | 8 |
117+
| Trace ID | `tr-39a858c94eb86c3be340e23541717fe8` |
118+
119+
![vLLM trace](screenshots/vllm-trace.png)
120+
![vLLM waterfall](screenshots/vllm-summary.png)
121+
122+
---
123+
124+
#### OGX 1.0.2 → vLLM (`gpt-oss-120b`)
125+
126+
| Metric | Value |
127+
|---|---|
128+
| Session ID | `980fbcb8-...` |
129+
| Tokens | 29,629 |
130+
| Latency | 39.62s |
131+
| Spans | 8 |
132+
| Trace ID | `tr-26175953d7cb441e3e2da1cc5fc24607` |
133+
134+
![OGX trace](screenshots/ogx-trace.png)
135+
![OGX waterfall](screenshots/ogx-summary.png)
136+
137+
---
138+
139+
## Observability Setup Guide & RHOAI 3.5 Recommendation
140+
141+
### Summary
142+
143+
MLflow integration works. Follow this guide to hook Claude Code, OGX, and MLflow together on RHOAI — assuming all three are already deployed on the cluster. The setup requires the Red Hat MLflow fork for RHOAI 3.4, which will be replaced by upstream MLflow 3.11 in a future release.
144+
145+
### Prerequisites
146+
147+
The following must already be running on the cluster:
148+
149+
- Claude Code container deployed (see [deployment.yaml](https://github.com/red-hat-data-services/agentic-starter-kits/blob/main/agents/claude-code/deployment/deployment.yaml))
150+
- OGX deployed and serving a model
151+
- MLflow instance running via the ODH/RHOAI operator with a workspace matching your namespace
152+
153+
### Step-by-Step Setup (following the [deployment guide](https://github.com/red-hat-data-services/agentic-starter-kits/blob/main/agents/claude-code/deployment/README.md), adding MLflow-specific steps below)
154+
155+
#### 1. Add Python + MLflow to the Containerfile
156+
157+
The ODH build of MLflow uses the Red Hat fork which includes the `kubernetes-namespaced` auth plugin not yet in upstream 3.10.x:
158+
159+
```dockerfile
160+
RUN microdnf install -y python3.12 python3.12-pip
161+
RUN python3.12 -m pip install --no-cache-dir \
162+
'mlflow[kubernetes] @ git+https://github.com/red-hat-data-services/mlflow.git@v3.10.1+rhaiv.3'
163+
```
164+
165+
> This fork requirement will go away when RHOAI ships MLflow 3.11, at which point replace with `mlflow[kubernetes]>=3.11`.
166+
167+
#### 2. Grant RBAC to the pod's service account
168+
169+
```bash
170+
oc adm policy add-role-to-user edit -z default -n <your-namespace>
171+
```
172+
173+
> For production, use a dedicated service account with least-privilege RBAC scoped to the permissions MLflow's `kubernetes-namespaced` auth plugin requires.
174+
175+
#### 3. Add MLflow env vars to the [deployment](https://github.com/red-hat-data-services/agentic-starter-kits/blob/main/agents/claude-code/deployment/deployment.yaml)
176+
177+
```yaml
178+
- name: MLFLOW_TRACKING_URI
179+
value: "https://mlflow.<your-rhoai-namespace>.svc:8443" # namespace where MLflow is deployed (commonly redhat-ods-applications)
180+
- name: MLFLOW_TRACKING_AUTH
181+
value: "kubernetes-namespaced"
182+
- name: MLFLOW_WORKSPACE
183+
value: "<your-namespace>"
184+
- name: MLFLOW_EXPERIMENT_NAME
185+
value: "claude-code-traces"
186+
- name: MLFLOW_TRACKING_INSECURE_TLS
187+
value: "true" # for dev/test only — production deployments should use proper TLS certificates
188+
```
189+
190+
#### 4. Add OGX env vars to point Claude Code at OGX
191+
192+
```yaml
193+
- name: ANTHROPIC_BASE_URL
194+
value: "https://<your-ogx-route>"
195+
- name: ANTHROPIC_API_KEY
196+
value: "not-needed" # OGX does not validate API keys for self-hosted models, any non-empty string works
197+
- name: ANTHROPIC_CUSTOM_MODEL_OPTION
198+
value: "vllm/<your-model-name>"
199+
```
200+
201+
#### 5. Wire up autolog in the [entrypoint](https://github.com/red-hat-data-services/agentic-starter-kits/blob/main/agents/claude-code/deployment/entrypoint.sh)
202+
203+
The entrypoint runs `mlflow autolog claude` at startup and injects auth into the generated `.claude/settings.json`:
204+
205+
```bash
206+
mlflow autolog claude -u "${MLFLOW_TRACKING_URI}" -n "${MLFLOW_EXPERIMENT_NAME}" /workspace
207+
208+
python3.12 -c '
209+
import json, os
210+
sf = "/workspace/.claude/settings.json"
211+
with open(sf) as f: s = json.load(f)
212+
env = s.setdefault("env", {})
213+
env["MLFLOW_TRACKING_AUTH"] = "kubernetes-namespaced"
214+
env["MLFLOW_WORKSPACE"] = os.environ["MLFLOW_WORKSPACE"]
215+
env["MLFLOW_TRACKING_INSECURE_TLS"] = "true"
216+
with open(sf, "w") as f: json.dump(s, f, indent=2)
217+
'
218+
```
219+
220+
#### 6. Verify
221+
222+
```bash
223+
# Check startup logs
224+
oc logs deployment/<claude-deployment> | grep -i mlflow
225+
226+
# Run a test
227+
oc exec deployment/<claude-deployment> -- bash -c '
228+
export HOME=/home/claude-agent && cd /workspace
229+
~/.claude/claude-run -p "What is 2+2?"
230+
'
231+
232+
# Confirm trace was created by checking your MLflow UI under your experiment on your RHOAI MLflow instance
233+
```
234+
235+
### Recommendation for RHOAI 3.5
236+
237+
**Productize `mlflow autolog claude` as the agent tracing path.**
238+
239+
It works across all backends (Vertex AI, vLLM, OGX) with no changes to the tracing setup. It captures tool calls, token usage, latency, and session metadata out of the box. The only overhead is the stop-hook which runs after the session ends — zero impact on agent response times.
240+
241+
When RHOAI ships MLflow 3.11, drop the Red Hat fork and use upstream `mlflow[kubernetes]>=3.11`.
348 KB
Loading
390 KB
Loading
290 KB
Loading
428 KB
Loading
359 KB
Loading
384 KB
Loading

0 commit comments

Comments
 (0)