Red Hat OpenShift AI (RHOAI) Observability MCP

An MCP (Model Context Protocol) server that gives AI assistants direct access to Red Hat OpenShift AI observability data. Query Prometheus metrics, Alertmanager alerts, Loki logs, Grafana dashboards, and Kubernetes cluster state to troubleshoot vLLM inference workloads.

Features

17 tools across 6 categories for comprehensive observability
vLLM-aware metrics (TTFT, TPOT, E2E latency, KV cache, queue depth)
Composite investigation tools that correlate metrics, logs, and alerts automatically
Auto-detection of in-cluster vs external access to OpenShift services
Built on FastMCP with async backends via httpx

Architecture

graph TD
    A[Claude / AI Assistant] -->|MCP Protocol| B[rhoai-observability-mcp]
    B --> C[Thanos / Prometheus]
    B --> D[Alertmanager]
    B --> E[Loki]
    B --> F[Grafana]
    B --> G[Kubernetes / OpenShift]

Backends:

Backend	Purpose	Source
Prometheus (Thanos)	Metrics queries (PromQL)	`backends/prometheus.py`
Alertmanager	Active alerts and alert groups	`backends/alertmanager.py`
Loki	Log queries (LogQL)	`backends/loki.py`
Grafana	Dashboard discovery and panel queries	`backends/grafana.py`
Kubernetes (OpenShift)	Pods, events, nodes, InferenceServices	`backends/openshift.py`

Quick Start

# Clone and install
git clone https://github.com/amito/rhoai-observability-mcp.git
cd rhoai-observability-mcp
uv pip install -e ".[dev]"

# Configure (see INSTALL.md for all options)
export THANOS_URL=https://thanos-querier.openshift-monitoring.svc:9091
export ALERTMANAGER_URL=https://alertmanager-main.openshift-monitoring.svc:9093
export OPENSHIFT_TOKEN=$(oc whoami -t)

# Run
python -m rhoai_obs_mcp.server

See INSTALL.md for detailed setup, configuration, and Claude Desktop integration.

Build & Deploy

Build the container image

make build

Override the image name or tag:

make build IMAGE_NAME=quay.io/myorg/rhoai-observability-mcp IMAGE_TAG=v1.0.0

Push to registry

make push

Deploy to OpenShift

Prerequisites: oc login to your cluster and create the target project:

oc new-project rhoai-obs-mcp

Then deploy:

make deploy

This applies the manifests in deploy/ to the rhoai-obs-mcp namespace. To deploy to a different namespace:

make deploy NAMESPACE=my-namespace

Undeploy

make undeploy

If you deployed to a custom namespace, pass the same value:

make undeploy NAMESPACE=my-namespace

CI-built images

Container images are automatically built from main and published to GHCR:

ghcr.io/amito/rhoai-observability-mcp:latest

Tool Reference

Metrics

Tool	Description
`query_prometheus`	Execute a raw PromQL query against ThanosQuerier
`get_vllm_metrics`	Get a summary of key vLLM metrics (TTFT, TPOT, E2E, cache, queue) for a model
`list_metrics`	List available Prometheus metric names, optionally filtered by regex

Alerts

Tool	Description
`get_alerts`	Get active alerts from Alertmanager, filterable by severity and labels
`get_alert_groups`	Get alerts grouped by their routing labels

Logs

Tool	Description
`query_logs`	Execute a LogQL query against OpenShift LokiStack
`get_pod_logs`	Get logs for a specific pod by namespace and name

Cluster

Tool	Description
`get_pods`	List pods in a namespace with status, restarts, and creation time
`get_events`	List Kubernetes events, filterable by resource and reason
`get_node_status`	Get node status, capacity, and GPU allocation info
`describe_resource`	Get detailed description of a Kubernetes resource
`get_inference_services`	List KServe InferenceService resources

Dashboards

Tool	Description
`list_dashboards`	List available Grafana dashboards, filterable by tag or title
`get_dashboard_panels`	Get panels and their queries from a Grafana dashboard

Investigation

Tool	Description
`investigate_latency`	Correlate latency metrics, error logs, and alerts for a vLLM model
`investigate_gpu`	Correlate GPU utilization, KV cache, queue depth, and pod status
`investigate_errors`	Correlate error logs, alerts, and Kubernetes events in a namespace

Documentation

INSTALL.md -- Installation, configuration, and integration
TESTING.md -- Running tests and writing new ones
CONTRIBUTING.md -- Development setup and contribution guidelines

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github/workflows		.github/workflows
deploy		deploy
docs/plans		docs/plans
src/rhoai_obs_mcp		src/rhoai_obs_mcp
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Containerfile		Containerfile
INSTALL.md		INSTALL.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TESTING.md		TESTING.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Red Hat OpenShift AI (RHOAI) Observability MCP

Features

Architecture

Quick Start

Build & Deploy

Build the container image

Push to registry

Deploy to OpenShift

Undeploy

CI-built images

Tool Reference

Metrics

Alerts

Logs

Cluster

Dashboards

Investigation

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Red Hat OpenShift AI (RHOAI) Observability MCP

Features

Architecture

Quick Start

Build & Deploy

Build the container image

Push to registry

Deploy to OpenShift

Undeploy

CI-built images

Tool Reference

Metrics

Alerts

Logs

Cluster

Dashboards

Investigation

Documentation

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages