Skip to content

K6 Integration for Loadtesting#110

Merged
burakemirsezen merged 41 commits into
mainfrom
loadtest
Jun 8, 2026
Merged

K6 Integration for Loadtesting#110
burakemirsezen merged 41 commits into
mainfrom
loadtest

Conversation

@burakemirsezen

Copy link
Copy Markdown
Contributor

This branch adds a cluster-native loadtesting workflow to SML.

It introduces sml loadtest commands for launching a model and running k6 against it, running k6 against an already-running model, running against an external OpenAI-compatible URL, and batch-testing multiple model configs from YAML. k6 runs as its own SLURM job inside a container on the cluster, so load generation does not happen locally.

Comment thread images/k6/Dockerfile Outdated
Comment thread docs/loadtesting.md Outdated
Comment thread docs/loadtesting.md Outdated
@robmsmt

robmsmt commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Initially I was thinking something like:

set -euo pipefail

: "${LOADTEST_SERVER_URL:?export LOADTEST_SERVER_URL}"
: "${LOADTEST_API_KEY:?export LOADTEST_API_KEY}"
LOADTEST_PROMPTS_FILE="${LOADTEST_PROMPTS_FILE:-/capstor/store/cscs/swissai/infra01/loadtest/prompts.json}"
LOADTEST_SCENARIO="${LOADTEST_SCENARIO:-throughput}"

MODEL="/capstor/store/cscs/swissai/infra01/hf_models/models/swiss-ai/Apertus-8B-Instruct-2509"
ENV="src/swiss_ai_model_launch/assets/envs/sglang.toml"
TIME="04:00:00"
SUFFIX="$(whoami)"

SERVED_1="swiss-ai/Apertus-8B-Instruct-2509-tp4-${SUFFIX}"
SERVED_2="swiss-ai/Apertus-8B-Instruct-2509-tp1-dp4-${SUFFIX}"

# --- Experiment 1: TP=4 ---
OUT_1=$(sml advanced \
  --firecrest-system clariden \
  --partition normal \
  --slurm-nodes 1 \
  --slurm-time "$TIME" \
  --serving-framework sglang \
  --slurm-environment "$ENV" \
  --framework-args "--model-path $MODEL \
    --served-model-name $SERVED_1 \
    --host 0.0.0.0 --port 8080 \
    --tp-size 4 \
    --enable-metrics")
echo "$OUT_1"
JOB_1=$(echo "$OUT_1" | grep "Job submitted:" | awk '{print $3}')

echo "Waiting for $SERVED_1 to be healthy..."
until curl -fsS -H "Authorization: Bearer $LOADTEST_API_KEY" "$LOADTEST_SERVER_URL/v1/models" | grep -q "$SERVED_1"; do
  sleep 30
done

sml loadtest run \
  --firecrest-system clariden \
  --partition normal \
  --loadtest-server-url "$LOADTEST_SERVER_URL" \
  --loadtest-api-key "$LOADTEST_API_KEY" \
  --loadtest-model "$SERVED_1" \
  --loadtest-scenario "$LOADTEST_SCENARIO" \
  --loadtest-prompts-file "$LOADTEST_PROMPTS_FILE" \
  --no-wait-until-healthy

scancel "$JOB_1"

# --- Experiment 2: TP=1, DP=4 ---
OUT_2=$(sml advanced \
  --firecrest-system clariden \
  --partition normal \
  --slurm-nodes 1 \
  --slurm-time "$TIME" \
  --serving-framework sglang \
  --slurm-environment "$ENV" \
  --framework-args "--model-path $MODEL \
    --served-model-name $SERVED_2 \
    --host 0.0.0.0 --port 8080 \
    --tp-size 1 --dp-size 4 \
    --enable-metrics")
echo "$OUT_2"
JOB_2=$(echo "$OUT_2" | grep "Job submitted:" | awk '{print $3}')

echo "Waiting for $SERVED_2 to be healthy..."
until curl -fsS -H "Authorization: Bearer $LOADTEST_API_KEY" "$LOADTEST_SERVER_URL/v1/models" | grep -q "$SERVED_2"; do
  sleep 30
done

sml loadtest run \
  --firecrest-system clariden \
  --partition normal \
  --loadtest-server-url "$LOADTEST_SERVER_URL" \
  --loadtest-api-key "$LOADTEST_API_KEY" \
  --loadtest-model "$SERVED_2" \
  --loadtest-scenario "$LOADTEST_SCENARIO" \
  --loadtest-prompts-file "$LOADTEST_PROMPTS_FILE" \
  --no-wait-until-healthy

scancel "$JOB_2"

the sml loadtest advanced section has a clear advantage with --cancel-after-loadtest ... what is not clear to me is that we've added some complexity here and now have to maintain another set of flags in (def _add_loadtest_arguments) and if it's worth that trade off? Could we not import these so don't repeat the flags?

@burakemirsezen

Copy link
Copy Markdown
Contributor Author

Initially I was thinking something like:

set -euo pipefail

: "${LOADTEST_SERVER_URL:?export LOADTEST_SERVER_URL}"
: "${LOADTEST_API_KEY:?export LOADTEST_API_KEY}"
LOADTEST_PROMPTS_FILE="${LOADTEST_PROMPTS_FILE:-/capstor/store/cscs/swissai/infra01/loadtest/prompts.json}"
LOADTEST_SCENARIO="${LOADTEST_SCENARIO:-throughput}"

MODEL="/capstor/store/cscs/swissai/infra01/hf_models/models/swiss-ai/Apertus-8B-Instruct-2509"
ENV="src/swiss_ai_model_launch/assets/envs/sglang.toml"
TIME="04:00:00"
SUFFIX="$(whoami)"

SERVED_1="swiss-ai/Apertus-8B-Instruct-2509-tp4-${SUFFIX}"
SERVED_2="swiss-ai/Apertus-8B-Instruct-2509-tp1-dp4-${SUFFIX}"

# --- Experiment 1: TP=4 ---
OUT_1=$(sml advanced \
  --firecrest-system clariden \
  --partition normal \
  --slurm-nodes 1 \
  --slurm-time "$TIME" \
  --serving-framework sglang \
  --slurm-environment "$ENV" \
  --framework-args "--model-path $MODEL \
    --served-model-name $SERVED_1 \
    --host 0.0.0.0 --port 8080 \
    --tp-size 4 \
    --enable-metrics")
echo "$OUT_1"
JOB_1=$(echo "$OUT_1" | grep "Job submitted:" | awk '{print $3}')

echo "Waiting for $SERVED_1 to be healthy..."
until curl -fsS -H "Authorization: Bearer $LOADTEST_API_KEY" "$LOADTEST_SERVER_URL/v1/models" | grep -q "$SERVED_1"; do
  sleep 30
done

sml loadtest run \
  --firecrest-system clariden \
  --partition normal \
  --loadtest-server-url "$LOADTEST_SERVER_URL" \
  --loadtest-api-key "$LOADTEST_API_KEY" \
  --loadtest-model "$SERVED_1" \
  --loadtest-scenario "$LOADTEST_SCENARIO" \
  --loadtest-prompts-file "$LOADTEST_PROMPTS_FILE" \
  --no-wait-until-healthy

scancel "$JOB_1"

# --- Experiment 2: TP=1, DP=4 ---
OUT_2=$(sml advanced \
  --firecrest-system clariden \
  --partition normal \
  --slurm-nodes 1 \
  --slurm-time "$TIME" \
  --serving-framework sglang \
  --slurm-environment "$ENV" \
  --framework-args "--model-path $MODEL \
    --served-model-name $SERVED_2 \
    --host 0.0.0.0 --port 8080 \
    --tp-size 1 --dp-size 4 \
    --enable-metrics")
echo "$OUT_2"
JOB_2=$(echo "$OUT_2" | grep "Job submitted:" | awk '{print $3}')

echo "Waiting for $SERVED_2 to be healthy..."
until curl -fsS -H "Authorization: Bearer $LOADTEST_API_KEY" "$LOADTEST_SERVER_URL/v1/models" | grep -q "$SERVED_2"; do
  sleep 30
done

sml loadtest run \
  --firecrest-system clariden \
  --partition normal \
  --loadtest-server-url "$LOADTEST_SERVER_URL" \
  --loadtest-api-key "$LOADTEST_API_KEY" \
  --loadtest-model "$SERVED_2" \
  --loadtest-scenario "$LOADTEST_SCENARIO" \
  --loadtest-prompts-file "$LOADTEST_PROMPTS_FILE" \
  --no-wait-until-healthy

scancel "$JOB_2"

the sml loadtest advanced section has a clear advantage with --cancel-after-loadtest ... what is not clear to me is that we've added some complexity here and now have to maintain another set of flags in (def _add_loadtest_arguments) and if it's worth that trade off? Could we not import these so don't repeat the flags?

Should I just get rid of the loadtest batch then?

@burakemirsezen burakemirsezen force-pushed the loadtest branch 3 times, most recently from dc9856f to 4fe1ef1 Compare April 29, 2026 18:11
@burakemirsezen

Copy link
Copy Markdown
Contributor Author

I removed batch submitting, made it so that the parser uses advanced arguments underneath, did some cleanup and a lot of debugging. Also pinned the k6 version. It is in a working state now, we can improve it later on.

Comment thread src/swiss_ai_model_launch/cli/healthcheck/checker.py
Comment thread src/swiss_ai_model_launch/launchers/firecrest_launcher.py Outdated
Comment thread src/swiss_ai_model_launch/launchers/slurm_launcher.py Outdated
Comment thread src/swiss_ai_model_launch/loadtest/k6/script.js
Comment thread src/swiss_ai_model_launch/loadtest/setup.py
Comment thread src/swiss_ai_model_launch/loadtest/models.py Outdated
Comment thread src/swiss_ai_model_launch/loadtest/core.py Outdated
Comment thread src/swiss_ai_model_launch/cli/main.py
@burakemirsezen

Copy link
Copy Markdown
Contributor Author

I added prometheus remote write for k6 and added/removed some flags for loadtest.

This should be more or less good to go now. Lmk if there are any issues or things that we want to add.

@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
61.8% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

Comment thread docs/loadtesting.md Outdated
Comment thread docs/loadtesting.md Outdated

@robmsmt robmsmt left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, nearly there... just some docs updates to change?

Comment thread docs/loadtesting.md Outdated
Comment thread docs/loadtesting.md Outdated
Comment thread docs/loadtesting.md Outdated
Comment thread docs/loadtesting.md
Comment thread src/swiss_ai_model_launch/loadtest/cluster.py Outdated
robmsmt
robmsmt previously approved these changes Jun 5, 2026

@robmsmt robmsmt left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

The k6 script.js is JavaScript, not part of the Python test surface,
and was reporting 0% coverage on 174 new lines — dragging new-code
coverage to 62% and failing the 80% quality gate. Excluding it brings
new-code coverage to ~86%.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sonarqubecloud

sonarqubecloud Bot commented Jun 7, 2026

Copy link
Copy Markdown

@robmsmt robmsmt dismissed AryanAhadinia’s stale review June 8, 2026 18:39

can't see what is blocking suggest we move forwards with this

@burakemirsezen burakemirsezen merged commit 05ee56f into main Jun 8, 2026
16 checks passed
@burakemirsezen burakemirsezen deleted the loadtest branch June 8, 2026 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants