Skip to content

Commit 7fb683f

Browse files
authored
docs: add getting started guide for Prometheus metrics (#5505)
* docs: add getting started guide for Prometheus metrics Signed-off-by: Akhil Mukkara <akhil.mukkara@gmail.com> * fix: update deployment name and selector in getting started guide Signed-off-by: Akhil Mukkara <akhil.mukkara@gmail.com> * fix: update ExperimentStatus value description in README Signed-off-by: Akhil Mukkara <akhil.mukkara@gmail.com> --------- Signed-off-by: Akhil Mukkara <akhil.mukkara@gmail.com>
1 parent 894ba57 commit 7fb683f

2 files changed

Lines changed: 141 additions & 1 deletion

File tree

chaoscenter/graphql/server/pkg/metrics/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ Type: Histogram with buckets from 1s to 30 minutes
5050
**`litmus_experiment_status`**
5151
Tracks the current status of experiment runs.
5252
Labels: `project_id`, `experiment_id`, `experiment_name`, `status`, `infra_id`
53-
Values: `0` = experiment started, `1` = experiment completed
53+
Values: `1` = experiment started, `0` = experiment completed
5454
The `status` label holds the phase string (e.g. `Running`, `Completed`, `Stopped`, `Error`)
5555

5656
---
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
# Getting Started with LitmusChaos Control Plane Metrics
2+
3+
This guide explains how to set up Prometheus scraping and Grafana visualization for the LitmusChaos GraphQL server metrics.
4+
5+
## Prerequisites
6+
7+
- LitmusChaos installed on a Kubernetes cluster
8+
- [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) installed via Helm (includes Prometheus and Grafana)
9+
10+
---
11+
12+
## Step 1: Verify the Metrics Endpoint
13+
14+
The GraphQL server exposes Prometheus metrics on port `8889` at `/metrics`. This port is configurable via the `METRICS_PORT` environment variable.
15+
16+
Verify the metrics server is running:
17+
18+
```bash
19+
kubectl port-forward -n litmus deployment/litmusportal-server 8889:8889
20+
curl http://localhost:8889/metrics | grep litmus_
21+
```
22+
23+
You should see metrics like `litmus_api_requests_total`, `litmus_experiment_runs_total`, and `litmus_experiment_status`.
24+
25+
---
26+
27+
## Step 2: Create the Metrics Service
28+
29+
Create a Kubernetes Service to expose the metrics port:
30+
31+
```yaml
32+
apiVersion: v1
33+
kind: Service
34+
metadata:
35+
name: litmus-server-metrics
36+
namespace: litmus
37+
labels:
38+
app: litmus-server-metrics
39+
spec:
40+
selector:
41+
component: litmusportal-server
42+
ports:
43+
- name: metrics
44+
port: 8889
45+
targetPort: 8889
46+
type: ClusterIP
47+
```
48+
49+
Apply it:
50+
51+
```bash
52+
kubectl apply -f litmus-server-metrics-service.yaml
53+
```
54+
55+
---
56+
57+
## Step 3: Create a ServiceMonitor
58+
59+
Create a `ServiceMonitor` so Prometheus automatically discovers and scrapes the metrics:
60+
61+
```yaml
62+
apiVersion: monitoring.coreos.com/v1
63+
kind: ServiceMonitor
64+
metadata:
65+
name: litmus-server-metrics
66+
namespace: litmus
67+
labels:
68+
release: prometheus
69+
spec:
70+
selector:
71+
matchLabels:
72+
app: litmus-server-metrics
73+
endpoints:
74+
- port: metrics
75+
path: /metrics
76+
interval: 30s
77+
```
78+
79+
Apply it:
80+
81+
```bash
82+
kubectl apply -f litmus-server-metrics-monitor.yaml
83+
```
84+
85+
> **Note:** The `release: prometheus` label must match the label selector configured in your Prometheus operator. If you used a different Helm release name, update this label accordingly.
86+
87+
---
88+
89+
## Step 4: Verify Prometheus is Scraping
90+
91+
Port-forward to Prometheus and verify the target is being scraped:
92+
93+
```bash
94+
kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090
95+
```
96+
97+
Open `http://localhost:9090/targets` and look for `litmus-server-metrics`. It should show as `UP`.
98+
99+
You can also run a query in the Prometheus UI:
100+
101+
```
102+
litmus_api_requests_total
103+
```
104+
105+
---
106+
107+
## Step 5: Import the Grafana Dashboard
108+
109+
A pre-built Grafana dashboard is included at:
110+
111+
```
112+
chaoscenter/graphql/server/grafana/litmuschaos-metrics-dashboard.json
113+
```
114+
115+
To import it:
116+
117+
1. Port-forward to Grafana:
118+
```bash
119+
kubectl port-forward -n monitoring deployment/prometheus-grafana 3000:3000
120+
```
121+
122+
2. Open `http://localhost:3000` and log in.
123+
124+
3. Go to **Dashboards → New → Import**.
125+
126+
4. Upload `litmuschaos-metrics-dashboard.json`.
127+
128+
5. Select **Prometheus** as the datasource and click **Import**.
129+
130+
The dashboard includes panels for:
131+
- API request rate and response time
132+
- Experiment run counts and durations
133+
- Experiment status (active vs completed)
134+
- Infra agent counts
135+
136+
---
137+
138+
## Available Metrics
139+
140+
See [README.md](./README.md) for the full list of metrics, their labels, and descriptions.

0 commit comments

Comments
 (0)