Skip to content

Commit dcc28ee

Browse files
committed
docs: Extract legacy benchmark tooling to separate doc
Move benchmark_base.py CLI docs, inputs/outputs tables, and output examples to benchmark_legacy.md. Replace with a cross-reference link in benchmark.md to keep the main doc focused on the forward-looking scenarios matrix and integration strategy. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Gloire Rubambiza <gloire@ibm.com>
1 parent 3e68abb commit dcc28ee

File tree

2 files changed

+89
-86
lines changed

2 files changed

+89
-86
lines changed

inference_server/benchmark/benchmark.md

Lines changed: 1 addition & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -116,89 +116,4 @@ FMA standup to measure L2 and L3 metrics.
116116

117117
## Legacy Benchmark Tooling
118118

119-
The original benchmark implementation in this directory (`benchmark_base.py`, `scenarios.py`,
120-
`kube_ops.py`) predates the llm-d-benchmark integration. It measures L1 metrics directly
121-
via Kubernetes API polling and supports three scenarios:
122-
123-
| Scenario | What it measures | Matrix mapping |
124-
| -------------- | ---------------- | -------------- |
125-
| `baseline` | Cold start deploy-to-ready latency | Cold Start columns |
126-
| `scaling` | Scale up, down to 1, then up again with hit rate tracking | Wake from Sleep columns |
127-
| `new_variant` | Sequential deployment of multiple model variants | Introducing New Variant row |
128-
129-
It operates in three modes: `kind` (local Kind cluster), `remote` (OpenShift/remote cluster),
130-
and `simulated` (mock mode, no real GPUs). This tooling remains functional for standalone
131-
FMA actuation measurements and serves as a reference for the metrics being integrated into
132-
llm-d-benchmark.
133-
134-
### Baseline Startup Latency
135-
136-
**Objective:**
137-
Measure the time from **deployment (server-request submission)** to **dual-pod readiness**.
138-
139-
#### Inputs
140-
141-
| Parameter | Type | Required | Default | Description |
142-
| ------------------ | ------ | -------- | --------------------------------------- | ------------------------------------------------------------------------------------------------------ |
143-
| `--namespace` | `str` | **Yes** | --- | Openshift namespace to run benchmark |
144-
| `--yaml` | `str` | **Yes** | --- | Path to the server-requesting YAML template file |
145-
| `--image` | `str` | **Yes*** | --- | Image repository for the requester pod. Required *only if* `CONTAINER_IMG_REG` env var is **not** set |
146-
| `--tag` | `str` | **Yes*** | --- | Image tag for the requester pod. Required *only if* `CONTAINER_IMG_VERSION` env var is **not** set |
147-
| `--cleanup` | `bool` | No | `True` | Whether to clean up created resources after the benchmark |
148-
| `--iterations` | `int` | No | `1` | Number of times to run each benchmark scenario |
149-
| `--cluster-domain` | `str` | No | `fmaas-platform-eval.fmaas.res.ibm.com` | Cluster domain for Prometheus GPU metrics query |
150-
| `--model-path` | `str` | No | `None` | Path to JSON file containing models for scenario (used only in the `new_variant` scenario). |
151-
| `--scenario` | `str` | No | `"scaling"` | Benchmark scenario to run: `baseline`, `scaling`, or `new_variant`. |
152-
153-
#### Outputs
154-
155-
| Output | Description |
156-
| ---------------------- | -------------------------------------------------------------------------- |
157-
| `startup_time` | Total time from deployment to readiness |
158-
| `availability_mode` | Indicates whether the vLLM instance was started cold or resumed from sleep |
159-
160-
**Example Usage**
161-
```bash
162-
python3 inference_server/benchmark/bechmark_base.py --namespace <str> --yaml <str> --cleanup <bool,default:True> --iterations <int, default:1> --cluster-domain <str> --model-path <str> --scenario <str, default:scaling> --image <str> --tag <str>
163-
```
164-
165-
<details>
166-
<summary>Output Example (Subject to Change)</summary>
167-
168-
```
169-
2025-12-01 13:59:52,031 - INFO - scale-request-3-1764615426-4pztx-dual-lhv7s:scale-request-3-1764615426-v9jkh bound through a HIT.
170-
2025-12-01 13:59:52,053 - INFO - Checking readiness of Requester Pod:scale-request-3-1764615426-ktcqd
171-
2025-12-01 13:59:52,496 - INFO - Checking readiness of Requester Pod:scale-request-3-1764615426-ktcqd
172-
2025-12-01 13:59:53,930 - INFO - Checking readiness of Requester Pod:scale-request-3-1764615426-ktcqd
173-
2025-12-01 13:59:53,962 - INFO - Checking readiness of Requester Pod:scale-request-3-1764615426-ktcqd
174-
2025-12-01 13:59:53,972 - INFO - Checking readiness of Requester Pod:scale-request-3-1764615426-ktcqd
175-
2025-12-01 13:59:55,900 - INFO - Checking readiness of Requester Pod:scale-request-3-1764615426-ktcqd
176-
2025-12-01 14:00:03,738 - INFO - Checking readiness of Requester Pod:scale-request-3-1764615426-ktcqd
177-
2025-12-01 14:00:33,850 - INFO - Checking readiness of Requester Pod:scale-request-3-1764615426-ktcqd
178-
2025-12-01 14:01:03,904 - INFO - Checking readiness of Requester Pod:scale-request-3-1764615426-ktcqd
179-
2025-12-01 14:01:22,404 - INFO - Checking readiness of Requester Pod:scale-request-3-1764615426-ktcqd
180-
2025-12-01 14:01:22,405 - INFO - Requester Pod:scale-request-3-1764615426-ktcqd ready after 108s on node:fmaas-vllm-d-wv25b-worker-h100-3-hmtwn using GPU:GPU-ca8ae222-50e0-b69e-16f2-e49dac1afe28
181-
2025-12-01 14:01:22,405 - INFO - scale-request-3-1764615426-ktcqd-dual-pptzq:scale-request-3-1764615426-ktcqd bound through a COLD START.
182-
2025-12-01 14:01:22,405 - INFO - All pods {'scale-request-3-1764615426-hvxjg', 'scale-request-3-1764615426-v9jkh', 'scale-request-3-1764615426-ktcqd'} Ready after 108.97s
183-
replicaset.apps "scale-request-3-1764615426" deleted
184-
pod "scale-request-3-1764615426-9hlb2-dual-dgcg2" deleted
185-
pod "scale-request-3-1764615426-hvxjg-dual-59hc8" deleted
186-
pod "scale-request-3-1764615426-4pztx-dual-lhv7s" deleted
187-
pod "scale-request-3-1764615426-ktcqd-dual-pptzq" deleted
188-
2025-12-01 14:01:32,868 - INFO - ---------------------------------------------------------------------
189-
190-
Total Runs: 15
191-
Successful Runs: 15
192-
Failed Runs: 0
193-
Requester Pods
194-
Min: 9s,
195-
Max: 318s
196-
Average: 125.4s
197-
Median: 115s
198-
Hits: 3/6 (50%)
199-
Hit Wake-up Times
200-
Min: 9s,
201-
Max: 18s
202-
Average: 13.0s
203-
```
204-
</details>
119+
See [benchmark_legacy.md](benchmark_legacy.md) for documentation on the original `benchmark_base.py` tool.
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Legacy Benchmark Tooling
2+
3+
The original benchmark implementation in this directory (`benchmark_base.py`, `scenarios.py`,
4+
`kube_ops.py`) predates the llm-d-benchmark integration. It measures L1 actuation metrics
5+
directly via Kubernetes API polling and supports three scenarios:
6+
7+
| Scenario | What it measures | Matrix mapping (see [benchmark.md](benchmark.md)) |
8+
| -------------- | ---------------- | -------------------------------------------------- |
9+
| `baseline` | Cold start deploy-to-ready latency | Cold Start columns |
10+
| `scaling` | Scale up, down to 1, then up again with hit rate tracking | Wake from Sleep columns |
11+
| `new_variant` | Sequential deployment of multiple model variants | Introducing New Variant row |
12+
13+
It operates in three modes: `kind` (local Kind cluster), `remote` (OpenShift/remote cluster),
14+
and `simulated` (mock mode, no real GPUs). This tooling remains functional for standalone
15+
FMA actuation measurements and serves as a reference for the metrics being integrated into
16+
llm-d-benchmark.
17+
18+
## Baseline Startup Latency
19+
20+
**Objective:**
21+
Measure the time from **deployment (server-request submission)** to **dual-pod readiness**.
22+
23+
### Inputs
24+
25+
| Parameter | Type | Required | Default | Description |
26+
| ------------------ | ------ | -------- | --------------------------------------- | ------------------------------------------------------------------------------------------------------ |
27+
| `--namespace` | `str` | **Yes** | --- | Openshift namespace to run benchmark |
28+
| `--yaml` | `str` | **Yes** | --- | Path to the server-requesting YAML template file |
29+
| `--image` | `str` | **Yes*** | --- | Image repository for the requester pod. Required *only if* `CONTAINER_IMG_REG` env var is **not** set |
30+
| `--tag` | `str` | **Yes*** | --- | Image tag for the requester pod. Required *only if* `CONTAINER_IMG_VERSION` env var is **not** set |
31+
| `--cleanup` | `bool` | No | `True` | Whether to clean up created resources after the benchmark |
32+
| `--iterations` | `int` | No | `1` | Number of times to run each benchmark scenario |
33+
| `--cluster-domain` | `str` | No | `fmaas-platform-eval.fmaas.res.ibm.com` | Cluster domain for Prometheus GPU metrics query |
34+
| `--model-path` | `str` | No | `None` | Path to JSON file containing models for scenario (used only in the `new_variant` scenario). |
35+
| `--scenario` | `str` | No | `"scaling"` | Benchmark scenario to run: `baseline`, `scaling`, or `new_variant`. |
36+
37+
### Outputs
38+
39+
| Output | Description |
40+
| ---------------------- | -------------------------------------------------------------------------- |
41+
| `startup_time` | Total time from deployment to readiness |
42+
| `availability_mode` | Indicates whether the vLLM instance was started cold or resumed from sleep |
43+
44+
**Example Usage**
45+
```bash
46+
python3 inference_server/benchmark/bechmark_base.py --namespace <str> --yaml <str> --cleanup <bool,default:True> --iterations <int, default:1> --cluster-domain <str> --model-path <str> --scenario <str, default:scaling> --image <str> --tag <str>
47+
```
48+
49+
<details>
50+
<summary>Output Example (Subject to Change)</summary>
51+
52+
```
53+
2025-12-01 13:59:52,031 - INFO - scale-request-3-1764615426-4pztx-dual-lhv7s:scale-request-3-1764615426-v9jkh bound through a HIT.
54+
2025-12-01 13:59:52,053 - INFO - Checking readiness of Requester Pod:scale-request-3-1764615426-ktcqd
55+
2025-12-01 13:59:52,496 - INFO - Checking readiness of Requester Pod:scale-request-3-1764615426-ktcqd
56+
2025-12-01 13:59:53,930 - INFO - Checking readiness of Requester Pod:scale-request-3-1764615426-ktcqd
57+
2025-12-01 13:59:53,962 - INFO - Checking readiness of Requester Pod:scale-request-3-1764615426-ktcqd
58+
2025-12-01 13:59:53,972 - INFO - Checking readiness of Requester Pod:scale-request-3-1764615426-ktcqd
59+
2025-12-01 13:59:55,900 - INFO - Checking readiness of Requester Pod:scale-request-3-1764615426-ktcqd
60+
2025-12-01 14:00:03,738 - INFO - Checking readiness of Requester Pod:scale-request-3-1764615426-ktcqd
61+
2025-12-01 14:00:33,850 - INFO - Checking readiness of Requester Pod:scale-request-3-1764615426-ktcqd
62+
2025-12-01 14:01:03,904 - INFO - Checking readiness of Requester Pod:scale-request-3-1764615426-ktcqd
63+
2025-12-01 14:01:22,404 - INFO - Checking readiness of Requester Pod:scale-request-3-1764615426-ktcqd
64+
2025-12-01 14:01:22,405 - INFO - Requester Pod:scale-request-3-1764615426-ktcqd ready after 108s on node:fmaas-vllm-d-wv25b-worker-h100-3-hmtwn using GPU:GPU-ca8ae222-50e0-b69e-16f2-e49dac1afe28
65+
2025-12-01 14:01:22,405 - INFO - scale-request-3-1764615426-ktcqd-dual-pptzq:scale-request-3-1764615426-ktcqd bound through a COLD START.
66+
2025-12-01 14:01:22,405 - INFO - All pods {'scale-request-3-1764615426-hvxjg', 'scale-request-3-1764615426-v9jkh', 'scale-request-3-1764615426-ktcqd'} Ready after 108.97s
67+
replicaset.apps "scale-request-3-1764615426" deleted
68+
pod "scale-request-3-1764615426-9hlb2-dual-dgcg2" deleted
69+
pod "scale-request-3-1764615426-hvxjg-dual-59hc8" deleted
70+
pod "scale-request-3-1764615426-4pztx-dual-lhv7s" deleted
71+
pod "scale-request-3-1764615426-ktcqd-dual-pptzq" deleted
72+
2025-12-01 14:01:32,868 - INFO - ---------------------------------------------------------------------
73+
74+
Total Runs: 15
75+
Successful Runs: 15
76+
Failed Runs: 0
77+
Requester Pods
78+
Min: 9s,
79+
Max: 318s
80+
Average: 125.4s
81+
Median: 115s
82+
Hits: 3/6 (50%)
83+
Hit Wake-up Times
84+
Min: 9s,
85+
Max: 18s
86+
Average: 13.0s
87+
```
88+
</details>

0 commit comments

Comments
 (0)