K6 Integration for Loadtesting by burakemirsezen · Pull Request #110 · swiss-ai/model-launch

burakemirsezen · 2026-04-29T11:15:41Z

This branch adds a cluster-native loadtesting workflow to SML.

It introduces sml loadtest commands for launching a model and running k6 against it, running k6 against an already-running model, running against an external OpenAI-compatible URL, and batch-testing multiple model configs from YAML. k6 runs as its own SLURM job inside a container on the cluster, so load generation does not happen locally.

robmsmt · 2026-04-29T12:52:25Z

Initially I was thinking something like:

set -euo pipefail

: "${LOADTEST_SERVER_URL:?export LOADTEST_SERVER_URL}"
: "${LOADTEST_API_KEY:?export LOADTEST_API_KEY}"
LOADTEST_PROMPTS_FILE="${LOADTEST_PROMPTS_FILE:-/capstor/store/cscs/swissai/infra01/loadtest/prompts.json}"
LOADTEST_SCENARIO="${LOADTEST_SCENARIO:-throughput}"

MODEL="/capstor/store/cscs/swissai/infra01/hf_models/models/swiss-ai/Apertus-8B-Instruct-2509"
ENV="src/swiss_ai_model_launch/assets/envs/sglang.toml"
TIME="04:00:00"
SUFFIX="$(whoami)"

SERVED_1="swiss-ai/Apertus-8B-Instruct-2509-tp4-${SUFFIX}"
SERVED_2="swiss-ai/Apertus-8B-Instruct-2509-tp1-dp4-${SUFFIX}"

# --- Experiment 1: TP=4 ---
OUT_1=$(sml advanced \
  --firecrest-system clariden \
  --partition normal \
  --slurm-nodes 1 \
  --slurm-time "$TIME" \
  --serving-framework sglang \
  --slurm-environment "$ENV" \
  --framework-args "--model-path $MODEL \
    --served-model-name $SERVED_1 \
    --host 0.0.0.0 --port 8080 \
    --tp-size 4 \
    --enable-metrics")
echo "$OUT_1"
JOB_1=$(echo "$OUT_1" | grep "Job submitted:" | awk '{print $3}')

echo "Waiting for $SERVED_1 to be healthy..."
until curl -fsS -H "Authorization: Bearer $LOADTEST_API_KEY" "$LOADTEST_SERVER_URL/v1/models" | grep -q "$SERVED_1"; do
  sleep 30
done

sml loadtest run \
  --firecrest-system clariden \
  --partition normal \
  --loadtest-server-url "$LOADTEST_SERVER_URL" \
  --loadtest-api-key "$LOADTEST_API_KEY" \
  --loadtest-model "$SERVED_1" \
  --loadtest-scenario "$LOADTEST_SCENARIO" \
  --loadtest-prompts-file "$LOADTEST_PROMPTS_FILE" \
  --no-wait-until-healthy

scancel "$JOB_1"

# --- Experiment 2: TP=1, DP=4 ---
OUT_2=$(sml advanced \
  --firecrest-system clariden \
  --partition normal \
  --slurm-nodes 1 \
  --slurm-time "$TIME" \
  --serving-framework sglang \
  --slurm-environment "$ENV" \
  --framework-args "--model-path $MODEL \
    --served-model-name $SERVED_2 \
    --host 0.0.0.0 --port 8080 \
    --tp-size 1 --dp-size 4 \
    --enable-metrics")
echo "$OUT_2"
JOB_2=$(echo "$OUT_2" | grep "Job submitted:" | awk '{print $3}')

echo "Waiting for $SERVED_2 to be healthy..."
until curl -fsS -H "Authorization: Bearer $LOADTEST_API_KEY" "$LOADTEST_SERVER_URL/v1/models" | grep -q "$SERVED_2"; do
  sleep 30
done

sml loadtest run \
  --firecrest-system clariden \
  --partition normal \
  --loadtest-server-url "$LOADTEST_SERVER_URL" \
  --loadtest-api-key "$LOADTEST_API_KEY" \
  --loadtest-model "$SERVED_2" \
  --loadtest-scenario "$LOADTEST_SCENARIO" \
  --loadtest-prompts-file "$LOADTEST_PROMPTS_FILE" \
  --no-wait-until-healthy

scancel "$JOB_2"

the sml loadtest advanced section has a clear advantage with --cancel-after-loadtest ... what is not clear to me is that we've added some complexity here and now have to maintain another set of flags in (def _add_loadtest_arguments) and if it's worth that trade off? Could we not import these so don't repeat the flags?

burakemirsezen · 2026-04-29T14:11:19Z

Initially I was thinking something like:

set -euo pipefail

: "${LOADTEST_SERVER_URL:?export LOADTEST_SERVER_URL}"
: "${LOADTEST_API_KEY:?export LOADTEST_API_KEY}"
LOADTEST_PROMPTS_FILE="${LOADTEST_PROMPTS_FILE:-/capstor/store/cscs/swissai/infra01/loadtest/prompts.json}"
LOADTEST_SCENARIO="${LOADTEST_SCENARIO:-throughput}"

MODEL="/capstor/store/cscs/swissai/infra01/hf_models/models/swiss-ai/Apertus-8B-Instruct-2509"
ENV="src/swiss_ai_model_launch/assets/envs/sglang.toml"
TIME="04:00:00"
SUFFIX="$(whoami)"

SERVED_1="swiss-ai/Apertus-8B-Instruct-2509-tp4-${SUFFIX}"
SERVED_2="swiss-ai/Apertus-8B-Instruct-2509-tp1-dp4-${SUFFIX}"

# --- Experiment 1: TP=4 ---
OUT_1=$(sml advanced \
  --firecrest-system clariden \
  --partition normal \
  --slurm-nodes 1 \
  --slurm-time "$TIME" \
  --serving-framework sglang \
  --slurm-environment "$ENV" \
  --framework-args "--model-path $MODEL \
    --served-model-name $SERVED_1 \
    --host 0.0.0.0 --port 8080 \
    --tp-size 4 \
    --enable-metrics")
echo "$OUT_1"
JOB_1=$(echo "$OUT_1" | grep "Job submitted:" | awk '{print $3}')

echo "Waiting for $SERVED_1 to be healthy..."
until curl -fsS -H "Authorization: Bearer $LOADTEST_API_KEY" "$LOADTEST_SERVER_URL/v1/models" | grep -q "$SERVED_1"; do
  sleep 30
done

sml loadtest run \
  --firecrest-system clariden \
  --partition normal \
  --loadtest-server-url "$LOADTEST_SERVER_URL" \
  --loadtest-api-key "$LOADTEST_API_KEY" \
  --loadtest-model "$SERVED_1" \
  --loadtest-scenario "$LOADTEST_SCENARIO" \
  --loadtest-prompts-file "$LOADTEST_PROMPTS_FILE" \
  --no-wait-until-healthy

scancel "$JOB_1"

# --- Experiment 2: TP=1, DP=4 ---
OUT_2=$(sml advanced \
  --firecrest-system clariden \
  --partition normal \
  --slurm-nodes 1 \
  --slurm-time "$TIME" \
  --serving-framework sglang \
  --slurm-environment "$ENV" \
  --framework-args "--model-path $MODEL \
    --served-model-name $SERVED_2 \
    --host 0.0.0.0 --port 8080 \
    --tp-size 1 --dp-size 4 \
    --enable-metrics")
echo "$OUT_2"
JOB_2=$(echo "$OUT_2" | grep "Job submitted:" | awk '{print $3}')

echo "Waiting for $SERVED_2 to be healthy..."
until curl -fsS -H "Authorization: Bearer $LOADTEST_API_KEY" "$LOADTEST_SERVER_URL/v1/models" | grep -q "$SERVED_2"; do
  sleep 30
done

sml loadtest run \
  --firecrest-system clariden \
  --partition normal \
  --loadtest-server-url "$LOADTEST_SERVER_URL" \
  --loadtest-api-key "$LOADTEST_API_KEY" \
  --loadtest-model "$SERVED_2" \
  --loadtest-scenario "$LOADTEST_SCENARIO" \
  --loadtest-prompts-file "$LOADTEST_PROMPTS_FILE" \
  --no-wait-until-healthy

scancel "$JOB_2"

the sml loadtest advanced section has a clear advantage with --cancel-after-loadtest ... what is not clear to me is that we've added some complexity here and now have to maintain another set of flags in (def _add_loadtest_arguments) and if it's worth that trade off? Could we not import these so don't repeat the flags?

Should I just get rid of the loadtest batch then?

burakemirsezen · 2026-04-29T18:13:17Z

I removed batch submitting, made it so that the parser uses advanced arguments underneath, did some cleanup and a lot of debugging. Also pinned the k6 version. It is in a working state now, we can improve it later on.

burakemirsezen · 2026-05-12T23:27:18Z

I added prometheus remote write for k6 and added/removed some flags for loadtest.

This should be more or less good to go now. Lmk if there are any issues or things that we want to add.

sonarqubecloud · 2026-05-27T14:59:59Z

Quality Gate failed

Failed conditions
61.8% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

robmsmt

looks good, nearly there... just some docs updates to change?

robmsmt

LGTM

Suggested by SonarQube to avoid duplicating the .yaml literal across scenario loading code.

…lt job time setting

The k6 script.js is JavaScript, not part of the Python test surface, and was reporting 0% coverage on 174 new lines — dragging new-code coverage to 62% and failing the 80% quality gate. Excluding it brings new-code coverage to ~86%. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-06-07T16:46:46Z

Quality Gate passed

Issues
7 New issues
0 Accepted issues

Measures
0 Security Hotspots
86.4% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

can't see what is blocking suggest we move forwards with this

robmsmt reviewed Apr 29, 2026

View reviewed changes

Comment thread images/k6/Dockerfile Outdated

robmsmt reviewed Apr 29, 2026

View reviewed changes

Comment thread docs/loadtesting.md Outdated

robmsmt reviewed Apr 29, 2026

View reviewed changes

Comment thread docs/loadtesting.md Outdated

burakemirsezen force-pushed the loadtest branch 3 times, most recently from dc9856f to 4fe1ef1 Compare April 29, 2026 18:11

AryanAhadinia force-pushed the loadtest branch from 801a870 to cfdaf30 Compare April 30, 2026 05:32

AryanAhadinia previously requested changes Apr 30, 2026

View reviewed changes

AryanAhadinia force-pushed the loadtest branch from b06ee43 to ace0741 Compare April 30, 2026 07:52

AryanAhadinia assigned burakemirsezen May 20, 2026

AryanAhadinia added the enhancement New feature or request label May 20, 2026

burakemirsezen force-pushed the loadtest branch from cd5b79c to a08a110 Compare May 21, 2026 12:11

burakemirsezen requested a review from AryanAhadinia May 21, 2026 13:46

AryanAhadinia mentioned this pull request May 22, 2026

Re-add DCGM exporter #140

Merged

burakemirsezen force-pushed the loadtest branch from 9d18727 to c365ec5 Compare May 27, 2026 14:58

burakemirsezen force-pushed the loadtest branch from c365ec5 to 9530d52 Compare May 31, 2026 12:48

robmsmt reviewed Jun 4, 2026

View reviewed changes

Comment thread docs/loadtesting.md Outdated

robmsmt reviewed Jun 4, 2026

View reviewed changes

Comment thread docs/loadtesting.md Outdated

robmsmt requested changes Jun 4, 2026

View reviewed changes

robmsmt previously approved these changes Jun 5, 2026

View reviewed changes

burakemirsezen added 5 commits June 6, 2026 12:41

Add cluster loadtest runner

b7a8772

Document loadtest scenarios and pin k6

9eadefe

Format loadtest files

89c9998

Fix loadtest mypy errors

8e02e25

Clean up k6 loadtest script

481b812

burakemirsezen added 22 commits June 6, 2026 12:41

loadtest: add cluster metrics remote write

28fc612

loadtest: simplify CLI scenario controls

742f233

docs: document loadtest open-loop metrics

ecf082f

loadtest: remove low-level run knobs

299d67e

loadtest: remove custom k6 script override

8ac0c59

Loadtest job length setting

c9ffa5e

Markdown lint fix

0acaf70

Revert health checker changes

30f172b

Move loadtest CLI handling out of main

b8dee6d

Remove future import

eb5a6e8

Revert parser renaming

ba4a495

Keep reservation out of launch args

8456d3b

Deduplicate scenario file suffixes

99c6eb3

Suggested by SonarQube to avoid duplicating the .yaml literal across scenario loading code.

Trim packaged loadtest scenarios

d57dbe3

Prompt for preconfigured loadtest scenario

7917e28

Reformat loadtest.py

86d3208

Add job label to k6 prometheus export

275be1f

test: add loadtest unit and integration coverage

f17c1e5

Update the docs to use the new path

185b445

Update comment in sh file

075fc9c

Increase the default job time for loadtest and de-duplicate the defau…

5666649

…lt job time setting

Fix the launchers for the tests

2050a18

burakemirsezen dismissed robmsmt’s stale review via 2050a18 June 6, 2026 09:42

burakemirsezen force-pushed the loadtest branch from 91cb850 to 2050a18 Compare June 6, 2026 09:42

robmsmt approved these changes Jun 8, 2026

View reviewed changes

burakemirsezen merged commit 05ee56f into main Jun 8, 2026
16 checks passed

burakemirsezen deleted the loadtest branch June 8, 2026 18:46

Conversation

burakemirsezen commented Apr 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

robmsmt commented Apr 29, 2026

Uh oh!

burakemirsezen commented Apr 29, 2026

Uh oh!

burakemirsezen commented Apr 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

burakemirsezen commented May 12, 2026

Uh oh!

sonarqubecloud Bot commented May 27, 2026

Quality Gate failed

Uh oh!

Uh oh!

Uh oh!

robmsmt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

robmsmt left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented Jun 7, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants