Skip to content

Commit 3ffdbc2

Browse files
Trim packaged loadtest scenarios
1 parent 326f437 commit 3ffdbc2

9 files changed

Lines changed: 16 additions & 113 deletions

File tree

docs/loadtesting.md

Lines changed: 8 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -102,23 +102,17 @@ The model-ready health check waits up to 1000000 seconds by default. Override it
102102

103103
Built-in scenarios are packaged under `src/swiss_ai_model_launch/assets/scenarios`.
104104

105-
| Scenario | Pattern | Duration | Think time | Max tokens | Prompt labels | Use case |
106-
| ------------------ | ------------------------------- | -------- | ---------- | ---------- | ------------------------------------- | --------------------------------------------------------- |
107-
| `throughput` | 20 constant VUs | 15m | 2s | 2048 | all | Baseline sustained throughput. |
108-
| `ramp` | 0 -> 10 -> 25 -> 50 VUs | 16m | 2s | 2048 | all | Gradual capacity ramp with plateaus. |
109-
| `stress` | 0 -> 20 -> 50 -> 100 -> 150 VUs | 16m | 2s | 2048 | all | Push the service past normal operating load. |
110-
| `spike` | 10 -> 100 -> 10 VUs | 8m30s | 0s | 4096 | all | Sudden traffic surge and recovery behavior. |
111-
| `soak` | 20 constant VUs | 30m | 2s | 2048 | all | Longer stability run for drift, leaks, and tail latency. |
112-
| `decode` | 50 constant VUs | 15m | 0s | 4096 | `short`, `medium` | Decode-heavy run with shorter prompts and longer outputs. |
113-
| `kv_stress` | 0 -> 30 -> 0 VUs | 15m | 0s | 4096 | `long_input`, `xl_input`, `conv_long` | KV-cache pressure with long inputs and long outputs. |
114-
| `open_loop` | 20 arrivals/s | 15m | 0s | 2048 | all | Fixed request-rate latency test with EOS ignored. |
115-
| `open_loop_ramp` | 2 -> 30 arrivals/s | 15m | 0s | 2048 | all | Open-loop capacity sweep with EOS ignored. |
116-
| `open_loop_decode` | 2 -> 5 arrivals/s | 12m | 0s | 512 | `short`, `medium` | Open-loop decode-focused A/B benchmark. |
117-
| `realistic` | 20 constant VUs | 15m | 30s | 2048 | all | Lower-pressure interactive traffic shape. |
105+
| Scenario | Pattern | Duration | Think time | Max tokens | Prompt labels | Use case |
106+
| ------------------ | ------------------------- | -------- | ---------- | ---------- | ----------------- | --------------------------------------- |
107+
| `throughput` | 20 constant VUs | 15m | 2s | 2048 | all | Baseline sustained throughput. |
108+
| `ramp` | 0 -> 10 -> 25 -> 50 VUs | 16m | 2s | 2048 | all | Gradual capacity ramp with plateaus. |
109+
| `open_loop` | 20 arrivals/s | 15m | 0s | 2048 | all | Fixed request-rate latency test. |
110+
| `open_loop_ramp` | 2 -> 30 arrivals/s | 15m | 0s | 2048 | all | Open-loop capacity sweep. |
111+
| `open_loop_decode` | 2 -> 5 arrivals/s | 12m | 0s | 512 | `short`, `medium` | Open-loop decode-focused A/B benchmark. |
118112

119113
Custom scenarios can be placed in `./scenarios/` where you run `sml`. Use YAML, YML, or JSON. A custom scenario with the same name overrides the built-in one.
120114

121-
Prompt labels are tags inside the prompt corpus. Scenarios use them to select a subset of prompts, for example `decode` selects shorter prompts while `kv_stress` selects long-input prompts. Put label choices in scenario YAML rather than on the command line.
115+
Prompt labels are tags inside the prompt corpus. Scenarios use them to select a subset of prompts, for example `open_loop_decode` selects shorter prompts. Put label choices in scenario YAML rather than on the command line.
122116

123117
The k6 script shuffles the selected prompt corpus with a deterministic seed, then cycles through that shuffled order by global iteration number. This keeps repeated runs comparable while avoiding artifacts from sorted prompt files. The default seed is `1`; override it with `--loadtest-prompt-seed`. For paired A/B runs, use the same seed for both configurations.
124118

src/swiss_ai_model_launch/assets/scenarios/decode.yaml

Lines changed: 0 additions & 9 deletions
This file was deleted.

src/swiss_ai_model_launch/assets/scenarios/kv_stress.yaml

Lines changed: 0 additions & 17 deletions
This file was deleted.

src/swiss_ai_model_launch/assets/scenarios/realistic.yaml

Lines changed: 0 additions & 6 deletions
This file was deleted.

src/swiss_ai_model_launch/assets/scenarios/soak.yaml

Lines changed: 0 additions & 6 deletions
This file was deleted.

src/swiss_ai_model_launch/assets/scenarios/spike.yaml

Lines changed: 0 additions & 19 deletions
This file was deleted.

src/swiss_ai_model_launch/assets/scenarios/stress.yaml

Lines changed: 0 additions & 17 deletions
This file was deleted.

src/swiss_ai_model_launch/loadtest/core.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,5 @@ def build_run_config(server: ServerConfig, bench: LoadtestConfig) -> dict[str, A
5555
"prompt_labels": bench.prompt_labels,
5656
"ignore_eos": bench.ignore_eos,
5757
"prompt_seed": bench.prompt_seed,
58-
"realistic": None,
5958
"custom": None,
6059
}

src/swiss_ai_model_launch/loadtest/k6/script.js

Lines changed: 8 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@
77
*
88
* This script is strict: RUN_CONFIG_JSON must contain one of:
99
* - custom
10-
* - realistic
1110
* - scenario_definition
1211
* If none are present, initialization fails.
1312
*
@@ -36,10 +35,8 @@ const DEFAULT_REQUEST_TIMEOUT = "120s";
3635
const DEFAULT_PROMPT_SEED = 1;
3736
const DEFAULT_THINK_TIME = 2;
3837
const DEFAULT_MAX_VUS = 10;
39-
const DEFAULT_REALISTIC_USERS = 20;
4038
const DEFAULT_RAMP_DOWN = "30s";
4139
const DEFAULT_DURATION = "5m";
42-
const DEFAULT_REALISTIC_DURATION = "15m";
4340
const ESTIMATED_CHARS_PER_TOKEN = 4;
4441
const MS_PER_SECOND = 1000;
4542
const LATENCY_LABEL_PATTERN = /^e2e_latency_ms\{label:(.+)\}$/;
@@ -107,15 +104,18 @@ const IGNORE_EOS =
107104
(RUN_CFG.ignore_eos ?? RUN_CFG.scenario_definition?.ignore_eos ?? false) ===
108105
true;
109106
const PROMPT_SEED = parseInteger(
110-
__ENV.PROMPT_SEED ?? RUN_CFG.prompt_seed ?? RUN_CFG.scenario_definition?.prompt_seed,
107+
__ENV.PROMPT_SEED ??
108+
RUN_CFG.prompt_seed ??
109+
RUN_CFG.scenario_definition?.prompt_seed,
111110
DEFAULT_PROMPT_SEED,
112111
);
113112
// THINK_TIME: max seconds of sleep between requests per VU (uniform [0, THINK_TIME]).
114113
// Lower values → more in-flight requests → higher KV cache fill. 0 = no sleep.
115114
const THINK_TIME = parseNumber(RUN_CFG.think_time, DEFAULT_THINK_TIME);
116115
// MAX_TOKENS: when set, overrides the per-prompt max_tokens.
117116
// KV cache fill is driven by the decode phase — longer outputs hold KV blocks longer.
118-
// Use 1024–4096 with kv_stress to keep requests alive and fill the cache.
117+
// Use 1024–4096 in KV-heavy custom scenarios to keep requests alive and
118+
// fill the cache.
119119
const MAX_TOKENS = RUN_CFG.max_tokens
120120
? Number.parseInt(RUN_CFG.max_tokens, 10)
121121
: null;
@@ -302,37 +302,21 @@ function buildCustomScenario(custom) {
302302
};
303303
}
304304

305-
function buildRealisticScenario(realistic) {
306-
if (!realistic) return null;
307-
return {
308-
executor: SCENARIO_CONSTANT_VUS,
309-
vus: parsePositiveInteger(realistic.users, DEFAULT_REALISTIC_USERS),
310-
duration: realistic.duration ?? DEFAULT_REALISTIC_DURATION,
311-
};
312-
}
313-
314305
const customScenario = RUN_CFG.custom
315306
? buildCustomScenario(RUN_CFG.custom)
316307
: null;
317-
const realisticScenario = RUN_CFG.realistic
318-
? buildRealisticScenario(RUN_CFG.realistic)
319-
: null;
320308
const definedScenario = scenarioToK6(RUN_CFG.scenario_definition);
321-
const scenarioCandidates = [
322-
customScenario,
323-
realisticScenario,
324-
definedScenario,
325-
].filter(Boolean);
309+
const scenarioCandidates = [customScenario, definedScenario].filter(Boolean);
326310

327311
if (scenarioCandidates.length === 0) {
328312
throw new Error(
329-
"No scenario found in RUN_CONFIG_JSON. Expected one of: custom, realistic, scenario_definition",
313+
"No scenario found in RUN_CONFIG_JSON. Expected one of: custom, scenario_definition",
330314
);
331315
}
332316

333317
if (scenarioCandidates.length > 1) {
334318
throw new Error(
335-
"Ambiguous RUN_CONFIG_JSON: provide only one of custom, realistic, scenario_definition",
319+
"Ambiguous RUN_CONFIG_JSON: provide only one of custom, scenario_definition",
336320
);
337321
}
338322

0 commit comments

Comments
 (0)