docs: MLflow tracing for Claude Code on RHOAI by Nehanth · Pull Request #105 · red-hat-data-services/agentic-starter-kits

Nehanth · 2026-05-18T17:09:15Z

Summary

Adds agents/claude-code/ with documentation covering MLflow tracing for Claude Code on RHOAI (RHAIENG-4751, 4752, 4753, 4754).

RHAIENG-4751 — OGX telemetry investigation. Agent-level OTel spans via mlflow autolog claude work across all backends (Vertex AI, vLLM, OGX) with the same trace schema.
RHAIENG-4752 & 4753 — Tool call trace prototype and session-level metrics. Validated with "build me a tetris game" across all three backends.
RHAIENG-4754 — Step-by-step setup guide for hooking Claude Code, OGX, and MLflow together on RHOAI 3.4, with recommendation to productize for RHOAI 3.5.

Screenshots

Includes MLflow trace screenshots for all three backends showing both Inputs/Outputs detail and session waterfall views.

coderabbitai · 2026-05-18T17:09:32Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

New documentation for MLflow tracing integration with Claude Code agent runtimes on Red Hat OpenShift AI. The guide validates trace capture across three inference backends (Vertex AI, vLLM, OGX→vLLM), defines the expected trace schema, provides backend-specific execution results, and supplies step-by-step setup instructions including MLflow configuration and RBAC changes.

Changes

MLflow Tracing Documentation

Layer / File(s)	Summary
Overview and tracing validation across backends `agents/claude-code/mlflow-tracing.md`	Introduction and context for containerized Claude Code tracing on OpenShift AI with end-to-end evidence from running the same prompt across Vertex AI, vLLM direct, and OGX→vLLM backends with captured session traces.
Trace schema and backend-specific results `agents/claude-code/mlflow-tracing.md`	Expected trace schema with root conversation span and tool/LLM inference spans; captured per-span and session fields; backend-specific results from a "Tetris game" prompt including token counts, latency, span counts, and trace IDs.
Observability setup and RHOAI configuration `agents/claude-code/mlflow-tracing.md`	Prerequisites, Red Hat MLflow fork installation with Kubernetes auth plugin, RBAC configuration, environment variables for MLflow and OGX, entrypoint wiring for `mlflow autolog claude`, verification steps, and guidance for upgrading to upstream MLflow >=3.11.

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding MLflow tracing documentation for Claude Code on RHOAI, which matches the primary purpose of the changeset.
Description check	✅ Passed	The description is directly related to the changeset, providing a clear summary of the documentation added, the issues addressed, and key content including setup guides, screenshots, and validation details.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch mlflow-tracing-docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/claude-agent/mlflow-tracing.md`:
- Around line 107-118: The fenced code block containing the trace schema tree
(the block starting with "claude_code_conversation  (root)" and the subsequent
tool lines) lacks a language identifier and fails the markdown linter; update
that triple-backtick fence to include the language tag "text" (i.e., change ```
to ```text) so the block is recognized as plain text in mlflow-tracing.md.
- Around line 17-22: The fenced code block in
docs/claude-agent/mlflow-tracing.md containing the log snippet lacks a language
identifier which fails the markdown linter; update that block by adding a
language tag such as text or log after the opening backticks (i.e., change the
``` to ```text or ```log) so the linter accepts the block and the log output
lines (INFO Using native /v1/messages passthrough, base_url=..., model=..., HTTP
200) remain unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 2040aeeb-aeae-425b-92b7-0bec3d4586b3

📥 Commits

Reviewing files that changed from the base of the PR and between 237a0b5 and d4781b9.

⛔ Files ignored due to path filters (5)

docs/claude-agent/screenshots/ogx-trace.png is excluded by !**/*.png
docs/claude-agent/screenshots/vertex-summary.png is excluded by !**/*.png
docs/claude-agent/screenshots/vertex-trace.png is excluded by !**/*.png
docs/claude-agent/screenshots/vllm-summary.png is excluded by !**/*.png
docs/claude-agent/screenshots/vllm-trace.png is excluded by !**/*.png

📒 Files selected for processing (1)

docs/claude-agent/mlflow-tracing.md

tarun-etikala · 2026-05-18T18:53:59Z

Hey @Nehanth - a new repo-level ruleset is added that now requires Unit Tests and lint checks to pass before merge, plus approval from the agentic-starter-kits-maintainers team.

This PR is currently blocked because the Unit Tests check hasn't run on it. A rebase onto main should pick up the updated workflow and trigger the required checks. Please rebase when you get a chance.

Nehanth · 2026-05-18T19:43:59Z

Hey @Nehanth - a new repo-level ruleset is added that now requires Unit Tests and lint checks to pass before merge, plus approval from the agentic-starter-kits-maintainers team.

This PR is currently blocked because the Unit Tests check hasn't run on it. A rebase onto main should pick up the updated workflow and trigger the required checks. Please rebase when you get a chance.

Done!

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

docs/claude-agent/mlflow-tracing.md (1)
204-204: 🏗️ Heavy lift

Consider scoping down RBAC permissions.

Granting the edit role to the default service account provides broad read/write access to most resources in the namespace. For production deployments, consider creating a dedicated service account with minimal permissions required for MLflow integration (e.g., permissions to create/update experiments, runs, and access required storage). The exact permissions depend on the kubernetes-namespaced auth plugin requirements.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/claude-agent/mlflow-tracing.md` at line 204, The current instruction
uses "oc adm policy add-role-to-user edit -z default -n <your-namespace>" which
grants the broad edit role to the default service account; replace this with
guidance to create and bind a dedicated service account with least-privilege
RBAC for MLflow (instead of using the default SA). Update the docs to show
creating a service account (e.g., "mlflow-sa"), a Role or ClusterRole containing
only needed verbs/resources for experiments/runs and storage access, and a
RoleBinding that binds that Role to "mlflow-sa"; mention that the exact rules
should be derived from the kubernetes-namespaced auth plugin requirements and
provide the example placeholders for Role rules and the RoleBinding that users
must tailor for production.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/claude-agent/mlflow-tracing.md`:
- Around line 191-199: Update the MLflow version requirement in the Dockerfile
documentation snippet: replace the fork reference or the version constraint that
implies "3.11" with a concrete minimum of 3.11.1 so users get the
kubernetes-namespaced auth plugin; specifically change the pip install target
that currently uses "'mlflow[kubernetes] @
git+https://github.com/red-hat-data-services/mlflow.git@rhoai-3.4'" or any
mention of ">=3.11" to use "mlflow[kubernetes]>=3.11.1" (also update the
explanatory sentence that references RHOAI shipping 3.11 to mention 3.11.1), and
check the later reference around the second mention (line ~273) to ensure it
matches the same >=3.11.1 constraint.

---

Nitpick comments:
In `@docs/claude-agent/mlflow-tracing.md`:
- Line 204: The current instruction uses "oc adm policy add-role-to-user edit -z
default -n <your-namespace>" which grants the broad edit role to the default
service account; replace this with guidance to create and bind a dedicated
service account with least-privilege RBAC for MLflow (instead of using the
default SA). Update the docs to show creating a service account (e.g.,
"mlflow-sa"), a Role or ClusterRole containing only needed verbs/resources for
experiments/runs and storage access, and a RoleBinding that binds that Role to
"mlflow-sa"; mention that the exact rules should be derived from the
kubernetes-namespaced auth plugin requirements and provide the example
placeholders for Role rules and the RoleBinding that users must tailor for
production.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 38228af3-b2be-41d4-a99d-e0e02786305e

📥 Commits

Reviewing files that changed from the base of the PR and between d4781b9 and 8fb69c9.

⛔ Files ignored due to path filters (6)

docs/claude-agent/screenshots/ogx-summary.png is excluded by !**/*.png
docs/claude-agent/screenshots/ogx-trace.png is excluded by !**/*.png
docs/claude-agent/screenshots/vertex-summary.png is excluded by !**/*.png
docs/claude-agent/screenshots/vertex-trace.png is excluded by !**/*.png
docs/claude-agent/screenshots/vllm-summary.png is excluded by !**/*.png
docs/claude-agent/screenshots/vllm-trace.png is excluded by !**/*.png

📒 Files selected for processing (1)

docs/claude-agent/mlflow-tracing.md

aakankshaduggal

Nice work — the doc covers RHAIENG-4751 through 4754 cleanly, and the "same prompt across 3 backends" approach is a great way to prove backend-agnostic tracing. A few things to address:

File location vs repo restructure — there's an active discussion on restructuring the repo (see the thread on the restructure proposal). Should this live under agents/claude-code/ instead of docs/claude-agent/ to align with the new structure?
ogx-summary.png is missing — PR body notes "to be added." Should be included before merge.
MLFLOW_TRACKING_INSECURE_TLS=true — worth adding a note that this is for dev/test setups and production deployments should use proper TLS certificates.
ANTHROPIC_API_KEY=fake in step 4 — this works but could confuse readers. A brief note explaining why (OGX doesn't validate API keys for self-hosted models) would help.
Hardcoded redhat-ods-applications namespace in the MLflow tracking URI — this varies by RHOAI installation, worth calling out.

Reviewed by Claude with @aakankshaduggal's supervision

Nehanth · 2026-05-19T18:22:39Z

Thanks for the review @aakankshaduggal! All points addressed in the latest push:

File location — Moved to agents/claude-code/ to align with the repo structure.
ogx-summary.png — Added, all 6 screenshots are now included.
MLFLOW_TRACKING_INSECURE_TLS — Added note: "for dev/test only — production deployments should use proper TLS certificates."
ANTHROPIC_API_KEY=fake — Added note: "OGX does not validate API keys for self-hosted models, any non-empty string works."
Hardcoded namespace — Changed to mlflow.<your-rhoai-namespace>.svc:8443 with a comment noting redhat-ods-applications is common.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agents/claude-code/mlflow-tracing.md`:
- Around line 203-205: The doc currently shows granting the broad `edit` role to
the `default` service account via the command `oc adm policy add-role-to-user
edit -z default -n <your-namespace>`; change the guidance to instruct creating a
dedicated service account (e.g., `mlflow-sa`) and a minimal Role and RoleBinding
that only grant MLflow-required verbs/resources (list the specific API
groups/resources/verbs MLflow needs) instead of using `edit`, and replace the
single-line example with instructions to create the service account and bind
only that minimal role to it.
- Around line 247-248: The generated entrypoint currently hardcodes
env["MLFLOW_TRACKING_INSECURE_TLS"] = "true"; change this to read from the
environment with a safe default (e.g., use os.getenv or equivalent to set
MLFLOW_TRACKING_INSECURE_TLS to "true" only if explicitly set, defaulting to
"false") and update the runtime settings write (the code that writes to sf using
s) to reflect that value; also add a brief comment or docstring next to where
env is populated explaining this flag should only be enabled for dev/test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 548ce6d6-04e5-4046-b279-f02834cc8c1d

📥 Commits

Reviewing files that changed from the base of the PR and between 8fb69c9 and 5f74dd8.

⛔ Files ignored due to path filters (6)

agents/claude-code/screenshots/ogx-summary.png is excluded by !**/*.png
agents/claude-code/screenshots/ogx-trace.png is excluded by !**/*.png
agents/claude-code/screenshots/vertex-summary.png is excluded by !**/*.png
agents/claude-code/screenshots/vertex-trace.png is excluded by !**/*.png
agents/claude-code/screenshots/vllm-summary.png is excluded by !**/*.png
agents/claude-code/screenshots/vllm-trace.png is excluded by !**/*.png

📒 Files selected for processing (1)

agents/claude-code/mlflow-tracing.md

tarun-etikala

Thanks for the thorough investigation across all four RHAIENG tickets, @Nehanth. The tracing validation across three backends is valuable. A few things to address before merging.

Structure: doesn't match repo conventions

JIRA tickets as headings: doc uses ticket numbers as section headings. Please restructure around what the reader needs to do, not which ticket produced the finding. The setup guide (current RHAIENG-4754, Steps 1–6) should be the primary content. The investigation findings (4751, 4752/4753) belong in the JIRA ticket descriptions they're useful context but not actionable docs for someone setting up tracing. Recommendations based on findings could be added here
Voice: Repo docs use second person imperative ("Edit .env", "Run make deploy"). This doc uses first person plural throughout ("We deployed", "We ran"). Please rewrite to match.
Redundancy: The same backend comparison tables (Vertex AI, vLLM, OGX — identical trace IDs, tokens, latencies) in both the 4751 and 4752/4753 sections (~50 lines duplicated). Consolidate into a single "Results" section.

sanafayyaz315

Looks good overall. Once the structural changes @tarun-etikala recommended are addressed (restructure around reader actions instead of JIRA tickets, fix voice, consolidate duplicated tables), this should be good to merge.

Documents MLflow autolog integration with Claude Code across Vertex AI, vLLM, and OGX backends. Covers RHAIENG-4751, 4752, 4753, and 4754 — telemetry investigation, tool call tracing prototype, session-level metrics, and RHOAI 3.5 setup guide and recommendation. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

… fix API key placeholder, link to repo file

…OTel scope

aakankshaduggal

Thanks @Nehanth, lgtm! 🚢

github-actions Bot added the area/docs label May 18, 2026

github-actions Bot added the size/m label May 18, 2026

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread docs/claude-agent/mlflow-tracing.md Outdated

Comment thread docs/claude-agent/mlflow-tracing.md Outdated

Nehanth force-pushed the mlflow-tracing-docs branch from 8fb9834 to d896503 Compare May 18, 2026 17:13

Nehanth assigned aakankshaduggal May 18, 2026

Nehanth force-pushed the mlflow-tracing-docs branch from cca0d2d to 695fd30 Compare May 18, 2026 17:31

Nehanth force-pushed the mlflow-tracing-docs branch from 695fd30 to 8fb69c9 Compare May 18, 2026 19:43

Nehanth requested a review from a team as a code owner May 18, 2026 19:43

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread agents/claude-code/mlflow-tracing.md

aakankshaduggal reviewed May 19, 2026

View reviewed changes

github-actions Bot removed the area/docs label May 19, 2026

coderabbitai Bot reviewed May 19, 2026

View reviewed changes

Comment thread agents/claude-code/mlflow-tracing.md

Comment thread agents/claude-code/mlflow-tracing.md

Nehanth requested a review from aakankshaduggal May 19, 2026 18:27

tarun-etikala requested changes May 19, 2026

View reviewed changes

Comment thread agents/claude-code/mlflow-tracing.md Outdated

Comment thread agents/claude-code/mlflow-tracing.md Outdated

Comment thread agents/claude-code/mlflow-tracing.md Outdated

Comment thread agents/claude-code/mlflow-tracing.md Outdated

sanafayyaz315 reviewed May 20, 2026

View reviewed changes

Nehanth requested review from sanafayyaz315 and tarun-etikala May 20, 2026 15:38

tarun-etikala reviewed May 20, 2026

View reviewed changes

Comment thread agents/claude-code/mlflow-tracing.md Outdated

tarun-etikala previously approved these changes May 20, 2026

View reviewed changes

Nehanth and others added 6 commits May 20, 2026 17:57

docs: move to agents/claude-code, address review comments

25b8fb0

docs: add RBAC production note

af1477d

docs: address review comments - remove Jira keys, pin to release tag,…

8531f0a

… fix API key placeholder, link to repo file

docs: consolidate results, fix voice, address review feedback

590b276

docs: fix voice, deduplicate results, link deployment files, clarify …

aa62a46

…OTel scope

Nehanth dismissed tarun-etikala’s stale review via aa62a46 May 20, 2026 22:07

Nehanth force-pushed the mlflow-tracing-docs branch from 4b14907 to aa62a46 Compare May 20, 2026 22:07

Nehanth added 3 commits May 20, 2026 18:10

docs: use full GitHub links for deployment files

50e0a70

docs: link step 3 to deployment.yaml

f25576d

docs: clarify session file as data source

e11e554

aakankshaduggal previously approved these changes May 21, 2026

View reviewed changes

Update mlflow-tracing.md

05f2e2f

Nehanth dismissed aakankshaduggal’s stale review via 05f2e2f May 21, 2026 14:15

aakankshaduggal approved these changes May 21, 2026

View reviewed changes

tarun-etikala approved these changes May 21, 2026

View reviewed changes

Nehanth merged commit 10e0129 into main May 21, 2026
8 checks passed

Nehanth deleted the mlflow-tracing-docs branch May 21, 2026 15:27

Conversation

Nehanth commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Screenshots

Uh oh!

coderabbitai Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tarun-etikala commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Nehanth commented May 18, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aakankshaduggal left a comment

Choose a reason for hiding this comment

Uh oh!

Nehanth commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tarun-etikala left a comment

Choose a reason for hiding this comment

Structure: doesn't match repo conventions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sanafayyaz315 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aakankshaduggal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Nehanth commented May 18, 2026 •

edited

Loading

coderabbitai Bot commented May 18, 2026 •

edited

Loading

tarun-etikala commented May 18, 2026 •

edited

Loading

Nehanth commented May 19, 2026 •

edited

Loading