Run agentevals alongside kagent on Kubernetes to evaluate AI agent conversations in real time. This example deploys three components:
- agentevals receives OTLP traces over HTTP and serves the evaluation UI
- OTel Collector Optional, useful when you want centralized telemetry controls.
- kagent provides Kubernetes-native AI agents with built-in OTel instrumentation (gRPC export only)
kagent (gRPC :4317) --> OTel Collector( optional ) --> agentevals (gRPC :4317 / HTTP :4318)
|
UI on :8001
- A running Kubernetes cluster (kind, minikube, EKS, GKE, etc.)
helmv3 installedkubectlconfigured for your cluster- An OpenAI API key (
OPENAI_API_KEY)
helm install agentevals ./charts/agentevals \
--set tag=0.6.3This creates a single pod exposing:
| Port | Purpose |
|---|---|
| 8001 | Web UI and API |
| 4317 | OTLP gRPC receiver (traces and logs) |
| 4318 | OTLP HTTP receiver (traces and logs) |
| 8080 | MCP (Streamable HTTP) |
Native gRPC ingestion in agentevals is sufficient for most setups, but an intermediate collector is still useful when you want centralized telemetry controls:
- traffic shaping (batching, retries, backpressure)
- filtering or redaction before data reaches agentevals
- routing/fan-out to additional backends
- protocol translation for mixed clients
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update
helm upgrade --install otel-collector open-telemetry/opentelemetry-collector \
--namespace kagent --create-namespace \
--set mode=deployment \
--set replicaCount=1 \
--set image.repository=otel/opentelemetry-collector \
--set ports.otlp.enabled=true \
--set ports.otlp-http.enabled=false \
--set config.exporters.otlp.endpoint="agentevals.default.svc.cluster.local:4317" \
--set config.exporters.otlp.compression="gzip" \
--set config.service.pipelines.traces.receivers[0]=otlp \
--set config.service.pipelines.traces.exporters[0]=otlp \
--set config.service.pipelines.logs.receivers[0]=otlp \
--set config.service.pipelines.logs.exporters[0]=otlpNote: If you deployed agentevals in a namespace other than
default, update theendpointvalue accordingly:http://agentevals.<namespace>.svc.cluster.local:4317.
Install the CRDs first, then the kagent operator with OTel tracing enabled:
helm install kagent-crds oci://ghcr.io/kagent-dev/kagent/helm/kagent-crds \
--namespace kagent \
--create-namespace
helm upgrade --install kagent oci://ghcr.io/kagent-dev/kagent/helm/kagent \
--namespace kagent \
--set providers.default=openAI \
--set providers.openAI.apiKey=$OPENAI_API_KEY \
--set agents.kgateway-agent.enabled=false \
--set agents.istio-agent.enabled=false \
--set agents.promql-agent.enabled=false \
--set agents.observability-agent.enabled=false \
--set agents.argo-rollouts-agent.enabled=false \
--set agents.cilium-policy-agent.enabled=false \
--set agents.cilium-manager-agent.enabled=false \
--set agents.cilium-debug-agent.enabled=false \
--set otel.tracing.enabled=true \
--set otel.tracing.exporter.otlp.endpoint="otel-collector-opentelemetry-collector.kagent.svc.cluster.local:4317" \
--set otel.tracing.exporter.otlp.insecure=trueThis installs kagent with only the default Helm agent (helm-agent) and the K8s troubleshooter enabled, and points its OTel exporter at the Collector.
Note: If you are not running an OTel Collector, point
otel.tracing.exporter.otlp.endpointdirectly to the agentevals OTLP gRPC endpoint instead:agentevals.default.svc.cluster.local:4317.
kubectl get pods -A -l 'app.kubernetes.io/name in (agentevals, kagent, opentelemetry-collector)'All pods should be Running before continuing.
This walkthrough shows how to evaluate two kagent agents side by side: the default Helm agent running gpt-4.1-mini and a new agent running gpt-5. You will chat with both agents, watch their traces stream into agentevals, select the better session as the evaluation baseline, and score both on tool trajectory and response match.
Port-forward both services to your local machine:
# Terminal 1: agentevals UI
kubectl port-forward svc/agentevals 8001:8001
# Terminal 2: kagent UI
kubectl port-forward -n kagent svc/kagent 8083:8083Open http://localhost:8083 for the kagent UI and http://localhost:8001 for the agentevals UI.
kagent ships with a default helm-agent configured to use gpt-4.1-mini. Create a second agent that uses gpt-5 so you can compare the two.
Option A: via the kagent UI
- Open http://localhost:8083
- Navigate to the Agents page
- Click Create Agent
- Copy the configuration from the existing
helm-agent(same system prompt, same tools) - Change the model to
gpt-5 - Name it
helm-agent-gpt5 - Save
Option B: via a CRD
Apply the following manifest (adjust the system prompt if needed):
apiVersion: kagent.dev/v1alpha1
kind: Agent
metadata:
name: helm-agent-gpt5
namespace: kagent
spec:
description: "Helm agent (GPT-5) for model comparison"
modelConfig:
model: gpt-5
apiKeySecretRef:
name: kagent-openai
key: OPENAI_API_KEY
systemPrompt: |
You are a Kubernetes Helm expert. You help users manage Helm charts,
releases, and repositories. Use your tools to inspect and manage
Helm resources in the cluster.
tools:
- name: helm-list
- name: helm-status
- name: helm-get-values
- name: helm-historykubectl apply -f helm-agent-gpt5.yaml- Go to http://localhost:8001
- Click Live in the sidebar to open the live streaming view
- Leave this tab open. Sessions will appear as traces arrive.
Switch to the kagent UI (http://localhost:8083) and have the same conversation with each agent. For example:
With helm-agent (gpt-4.1-mini):
- Select
helm-agentfrom the agent list - Start a new conversation
- Ask: "List all Helm releases across all namespaces and tell me which ones have pending upgrades"
- Follow up: "Show me the values for the agentevals release"
With helm-agent-gpt5 (gpt-5):
- Select
helm-agent-gpt5from the agent list - Start a new conversation
- Ask the same questions in the same order
Switch back to the agentevals Live view at http://localhost:8001. You will see two sessions appear, one for each conversation. Each session shows:
- Status transitioning from ACTIVE to COMPLETED as the conversation ends
- Span count incrementing in real time as the agent makes LLM calls and tool invocations
- Model name visible in the session metadata
Once both sessions are complete:
- Click on the
helm-agent-gpt5session card to open its trace details - Review the conversation: check that it called the right tools and produced correct responses
- Click Use as Eval Set to mark this session as the evaluation baseline
- Give it a name like
helm-agent-comparison
This captures the GPT-5 session's tool trajectory and final responses as the golden reference.
- Go back to the sessions list
- Select both sessions (the
gpt-4.1-minisession and thegpt-5session) - Click Evaluate
- Select the
helm-agent-comparisoneval set - Choose the metrics:
- tool_trajectory_avg_score: Did the agent call the correct tools in the correct order?
- response_match_score: Did the agent produce responses consistent with the golden reference?
- Run the evaluation
| Metric | What it tells you |
|---|---|
tool_trajectory_avg_score |
Whether the agent followed the expected sequence of Helm tool calls (helm-list, then helm-get-values). A score of 1.0 means it matched exactly. |
response_match_score |
How closely the agent's final answers matched the GPT-5 baseline. Useful for catching regressions when switching to a cheaper model. |
Compare the two sessions in the results table:
- Token usage: The session metadata includes total token counts. If
gpt-5consumed fewer tokens while achieving the same trajectory score, it may be the better choice for this use case. - Tool trajectory: If one agent called extra tools or skipped expected ones, the trajectory score reflects that.
- Response quality: A lower response match score on the
gpt-4.1-minisession highlights where the cheaper model diverged from the GPT-5 baseline.
You can also click an individual conversation and see a breakdown of each evaluators.
helm uninstall kagent -n kagent
helm uninstall kagent-crds -n kagent
helm uninstall otel-collector -n kagent
helm uninstall agentevals
kubectl delete namespace kagent