-
Notifications
You must be signed in to change notification settings - Fork 14
Improvements to the launcher-based tests #319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,7 +17,6 @@ spec: | |
| options: "--model HuggingFaceTB/SmolLM2-360M-Instruct" | ||
| env_vars: | ||
| VLLM_SERVER_DEV_MODE: "1" | ||
| VLLM_USE_V1: "1" | ||
| VLLM_LOGGING_LEVEL: "DEBUG" | ||
| VLLM_CPU_KVCACHE_SPACE: "1" # GiB | ||
| labels: | ||
|
|
@@ -38,7 +37,6 @@ spec: | |
| options: "--model Qwen/Qwen2.5-0.5B-Instruct" | ||
| env_vars: | ||
| VLLM_SERVER_DEV_MODE: "1" | ||
| VLLM_USE_V1: "1" | ||
| VLLM_LOGGING_LEVEL: "DEBUG" | ||
| VLLM_CPU_KVCACHE_SPACE: "1" # GiB | ||
|
Comment on lines
38
to
41
|
||
| labels: | ||
|
|
@@ -59,7 +57,6 @@ spec: | |
| options: "--model TinyLlama/TinyLlama-1.1B-Chat-v1.0" | ||
| env_vars: | ||
| VLLM_SERVER_DEV_MODE: "1" | ||
| VLLM_USE_V1: "1" | ||
| VLLM_LOGGING_LEVEL: "DEBUG" | ||
| VLLM_CPU_KVCACHE_SPACE: "1" # GiB | ||
|
Comment on lines
58
to
61
|
||
| labels: | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -278,12 +278,9 @@ expect '[ "$(kubectl get pod $reqlb3 -o jsonpath={.metadata.labels.dual-pods\\.l | |||||||||
| # Verify launcher is bound to new requester | ||||||||||
| expect '[ "$(kubectl get pod $launcherlb -o jsonpath={.metadata.labels.dual-pods\\.llm-d\\.ai/dual})" == "$reqlb3" ]' | ||||||||||
|
|
||||||||||
| # Verify the new requester is using isc2 | ||||||||||
| expect '[ "$(kubectl get pod $reqlb3 -o jsonpath={.metadata.annotations.dual-pods\\.llm-d\\.ai/inference-server-config})" == "'$isc2'" ]' | ||||||||||
|
|
||||||||||
| # Wait for requester to be ready (launcher should already be ready) | ||||||||||
| date | ||||||||||
| kubectl wait --for condition=Ready pod/$reqlb3 --timeout=30s | ||||||||||
| kubectl wait --for condition=Ready pod/$reqlb3 --timeout=120s | ||||||||||
|
||||||||||
| kubectl wait --for condition=Ready pod/$reqlb3 --timeout=120s | |
| kubectl wait --for condition=Ready pod/$reqlb3 --timeout=120s | |
| # Verify requester is using the patched inference server config (isc2) | |
| expect '[ "$(kubectl get pod $reqlb3 -o jsonpath={.metadata.annotations.dual-pods\\.llm-d\\.ai/inference-server-config})" == "$isc2" ]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silly LLM, why didn't you suggest testing with an if statement?
I do not think that this script needs to check whether the ReplicaSet controller behaved properly.
Copilot
AI
Mar 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scenario patches the ReplicaSet back to the original inference server config, but the test no longer asserts that the new requester pod actually has the expected dual-pods.llm-d.ai/inference-server-config annotation (or otherwise proves the first instance was re-selected). Without that, this can become a false-positive if the patch doesn’t apply or the wrong instance is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VLLM_USE_V1was removed from the InferenceServerConfig env vars here, but it’s still used in other repo examples (e.g., docs/e2e-recipe.md and .github/workflows/ci-e2e-openshift.yaml). This makes the launcher-based E2E objects diverge from documented/CI configurations and can make failures harder to reproduce. Consider keeping this env var (or updating the docs/workflows and adding a short note explaining why it’s no longer needed).