Skip to content

Commit 000a7d1

Browse files
committed
Remove vLLM max-num-seqs=5 bottleneck and align WVA thresholds to defaults
The benchmark was deploying vLLM with --max-num-seqs=5 (only 5 concurrent requests per pod), causing 2-3% KV cache utilization and ~1 RPS instead of the expected 60-100% KV cache and ~9 RPS. Removing this allows vLLM to use its default (256), matching the colleague's benchmark configuration. Also aligns WVA saturation thresholds (kvSpareTrigger, queueSpareTrigger) to chart defaults (0.1, 3) to match the colleague's setup. Made-with: Cursor
1 parent e64fe7a commit 000a7d1

File tree

1 file changed

+8
-9
lines changed

1 file changed

+8
-9
lines changed

.github/workflows/ci-benchmark.yaml

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -287,8 +287,8 @@ jobs:
287287
E2E_TESTS_ENABLED: "true"
288288
IMG: ${{ steps.build-image.outputs.image }}
289289
SKIP_BUILD: "true"
290-
KV_SPARE_TRIGGER: "0.5"
291-
QUEUE_SPARE_TRIGGER: "4.5"
290+
KV_SPARE_TRIGGER: "0.1"
291+
QUEUE_SPARE_TRIGGER: "3"
292292
INSTALL_GRAFANA: "true"
293293
run: make deploy-e2e-infra
294294

@@ -302,8 +302,8 @@ jobs:
302302
BENCHMARK_GRAFANA_SNAPSHOT_FILE: /tmp/benchmark-grafana-snapshot.txt
303303
BENCHMARK_GRAFANA_SNAPSHOT_JSON: /tmp/benchmark-grafana-snapshot.json
304304
BENCHMARK_GRAFANA_PANEL_DIR: /tmp/benchmark-panels
305-
KV_SPARE_TRIGGER: "0.5"
306-
QUEUE_SPARE_TRIGGER: "4.5"
305+
KV_SPARE_TRIGGER: "0.1"
306+
QUEUE_SPARE_TRIGGER: "3"
307307
run: make test-benchmark
308308

309309
- name: Upload benchmark results
@@ -550,12 +550,11 @@ jobs:
550550
CONTROLLER_INSTANCE: ${{ env.WVA_NAMESPACE }}
551551
DEPLOY_VA: "false"
552552
DEPLOY_HPA: "false"
553-
VLLM_MAX_NUM_SEQS: "5"
554553
DECODE_REPLICAS: "1"
555554
MONITORING_NAMESPACE: openshift-user-workload-monitoring
556555
WVA_METRICS_SECURE: "false"
557-
KV_SPARE_TRIGGER: "0.5"
558-
QUEUE_SPARE_TRIGGER: "4.5"
556+
KV_SPARE_TRIGGER: "0.1"
557+
QUEUE_SPARE_TRIGGER: "3"
559558
VLLM_SVC_PORT: "8000"
560559
INSTALL_GRAFANA: "true"
561560
run: |
@@ -589,8 +588,8 @@ jobs:
589588
BENCHMARK_GRAFANA_SNAPSHOT_FILE: /tmp/benchmark-grafana-snapshot.txt
590589
BENCHMARK_GRAFANA_SNAPSHOT_JSON: /tmp/benchmark-grafana-snapshot.json
591590
BENCHMARK_GRAFANA_PANEL_DIR: /tmp/benchmark-panels
592-
KV_SPARE_TRIGGER: "0.5"
593-
QUEUE_SPARE_TRIGGER: "4.5"
591+
KV_SPARE_TRIGGER: "0.1"
592+
QUEUE_SPARE_TRIGGER: "3"
594593
run: |
595594
# Get token for Thanos querier
596595
export PROMETHEUS_TOKEN=$(kubectl create token prometheus-k8s -n openshift-monitoring --duration=24h 2>/dev/null || echo "")

0 commit comments

Comments
 (0)