@@ -102,19 +102,19 @@ The model-ready health check waits up to 1000000 seconds by default. Override it
102102
103103Built-in scenarios are packaged under ` src/swiss_ai_model_launch/assets/scenarios ` .
104104
105- | Scenario | Pattern | Duration | Think time | Max tokens | Prompt labels | Use case |
106- | ------------ | ------------------------------- | -------- | ---------- | ---------- | ------------------------------------- | --------------------------------------------------------- |
107- | ` throughput ` | 20 constant VUs | 15m | 2s | 2048 | all | Baseline sustained throughput. |
108- | ` ramp ` | 0 -> 10 -> 25 -> 50 VUs | 16m | 2s | 2048 | all | Gradual capacity ramp with plateaus. |
109- | ` stress ` | 0 -> 20 -> 50 -> 100 -> 150 VUs | 16m | 2s | 2048 | all | Push the service past normal operating load. |
110- | ` spike ` | 10 -> 100 -> 10 VUs | 8m30s | 0s | 4096 | all | Sudden traffic surge and recovery behavior. |
111- | ` soak ` | 20 constant VUs | 30m | 2s | 2048 | all | Longer stability run for drift, leaks, and tail latency. |
112- | ` decode ` | 50 constant VUs | 15m | 0s | 4096 | ` short ` , ` medium ` | Decode-heavy run with shorter prompts and longer outputs. |
113- | ` kv_stress ` | 0 -> 30 -> 0 VUs | 15m | 0s | 4096 | ` long_input ` , ` xl_input ` , ` conv_long ` | KV-cache pressure with long inputs and long outputs. |
114- | ` open_loop ` | 20 arrivals/s | 15m | 0s | 2048 | all | Fixed request-rate latency test with EOS ignored. |
115- | ` open_loop_ramp ` | 2 -> 30 arrivals/s | 15m | 0s | 2048 | all | Open-loop capacity sweep with EOS ignored. |
116- | ` open_loop_decode ` | 2 -> 5 arrivals/s | 12m | 0s | 512 | ` short ` , ` medium ` | Open-loop decode-focused A/B benchmark. |
117- | ` realistic ` | 20 constant VUs | 15m | 30s | 2048 | all | Lower-pressure interactive traffic shape. |
105+ | Scenario | Pattern | Duration | Think time | Max tokens | Prompt labels | Use case |
106+ | ------------------ | ------------------------------- | -------- | ---------- | ---------- | ------------------------------------- | --------------------------------------------------------- |
107+ | ` throughput ` | 20 constant VUs | 15m | 2s | 2048 | all | Baseline sustained throughput. |
108+ | ` ramp ` | 0 -> 10 -> 25 -> 50 VUs | 16m | 2s | 2048 | all | Gradual capacity ramp with plateaus. |
109+ | ` stress ` | 0 -> 20 -> 50 -> 100 -> 150 VUs | 16m | 2s | 2048 | all | Push the service past normal operating load. |
110+ | ` spike ` | 10 -> 100 -> 10 VUs | 8m30s | 0s | 4096 | all | Sudden traffic surge and recovery behavior. |
111+ | ` soak ` | 20 constant VUs | 30m | 2s | 2048 | all | Longer stability run for drift, leaks, and tail latency. |
112+ | ` decode ` | 50 constant VUs | 15m | 0s | 4096 | ` short ` , ` medium ` | Decode-heavy run with shorter prompts and longer outputs. |
113+ | ` kv_stress ` | 0 -> 30 -> 0 VUs | 15m | 0s | 4096 | ` long_input ` , ` xl_input ` , ` conv_long ` | KV-cache pressure with long inputs and long outputs. |
114+ | ` open_loop ` | 20 arrivals/s | 15m | 0s | 2048 | all | Fixed request-rate latency test with EOS ignored. |
115+ | ` open_loop_ramp ` | 2 -> 30 arrivals/s | 15m | 0s | 2048 | all | Open-loop capacity sweep with EOS ignored. |
116+ | ` open_loop_decode ` | 2 -> 5 arrivals/s | 12m | 0s | 512 | ` short ` , ` medium ` | Open-loop decode-focused A/B benchmark. |
117+ | ` realistic ` | 20 constant VUs | 15m | 30s | 2048 | all | Lower-pressure interactive traffic shape. |
118118
119119Custom scenarios can be placed in ` ./scenarios/ ` where you run ` sml ` . Use YAML, YML, or JSON. A custom scenario with the same name overrides the built-in one.
120120
0 commit comments