Skip to content

Commit 3dd6249

Browse files
committed
script to extract benchmark data
1 parent ae64325 commit 3dd6249

2 files changed

Lines changed: 1011 additions & 0 deletions

File tree

hack/benchmark/README.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# Benchmark Report Generator
2+
3+
This script (`get_benchmark_report.py`) interacts with an OpenShift cluster to gather metrics and build a comprehensive benchmark report PDF. It captures data from Prometheus (both cluster and user-workload metrics) and parses GuideLLM results to generate text analysis and plots (e.g., KV cache usage, queued requests, replicas, TTFT, and ITL).
4+
5+
## Prerequisites & Installation
6+
7+
To run this script locally, ensure you have python installed and the required dependencies:
8+
9+
```bash
10+
pip install matplotlib pyyaml
11+
```
12+
13+
The script also relies on having the OpenShift CLI tool (`oc`) installed and accessible in your system's PATH.
14+
15+
## OpenShift Privileges (`oc`)
16+
17+
You must be logged into an OpenShift cluster with sufficient privileges. The script attempts to verify these permissions before executing.
18+
19+
- **Required Privileges**: You need `cluster-admin` rights, or a custom role that grants the ability to `create pods/exec` in both the `openshift-monitoring` and `openshift-user-workload-monitoring` namespaces. Wait time checks, fetching pod states, and querying Prometheus directly all rely on executing `curl` from inside prometheus pods.
20+
- If you are running this in a CI/CD environment, you can pass `-t`/`--token` and `-s`/`--server` arguments to have the script log in automatically via `oc login`.
21+
22+
## Usage & Arguments
23+
24+
```bash
25+
python get_benchmark_report.py [OPTIONS]
26+
```
27+
28+
| Argument | Short | Default | Description |
29+
|---|---|---|---|
30+
| `--namespace` | `-n` | `asmalvan-test` | The Kubernetes namespace containing the workload to query metrics for. |
31+
| `--window` | `-w` | `1h` | The time window for Prometheus queries (e.g., `30m`, `1h`, `2h`). |
32+
| `--output` | `-o` | `metrics_usage.png` | Output base name for the plot files. Generates both `.png` and `.pdf` files. |
33+
| `--results-dir` | `-r` | `None` | Path to a GuideLLM `exp-docs` results directory. |
34+
| `--token` | `-t` | `None` | OpenShift login token. |
35+
| `--server` | `-s` | `None` | OpenShift API server URL. |
36+
37+
## Path to GuideLLM Results
38+
39+
To include GuideLLM latency calculations, true serving capacity estimates, and performance scoring in your output, you **must** supply the path to the GuideLLM results directory.
40+
41+
Example:
42+
```bash
43+
python get_benchmark_report.py -n my-namespace -r /path/to/exp-docs
44+
```
45+
46+
When you use the `-r` (or `--results-dir`) flag:
47+
- The script looks for `results.json` to calculate Successful vs Failed RPS, P99 TTFT, and P99 ITL.
48+
- It will also read the `*_results.json_*.yaml` parameter files in that directory to tightly clamp the Prometheus time windows precisely onto the start/stop time of the benchmark phases instead of relying strictly on the `--window` parameter.
49+
- The output files will be automatically named based on the basename of the results directory (e.g., `metrics_usage_exp-docs.pdf`).

0 commit comments

Comments
 (0)