Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions test/e2e/mkobjs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ spec:
options: "--model HuggingFaceTB/SmolLM2-360M-Instruct"
env_vars:
VLLM_SERVER_DEV_MODE: "1"
VLLM_USE_V1: "1"
VLLM_LOGGING_LEVEL: "DEBUG"
VLLM_CPU_KVCACHE_SPACE: "1" # GiB
Comment on lines 18 to 21
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VLLM_USE_V1 was removed from the InferenceServerConfig env vars here, but it’s still used in other repo examples (e.g., docs/e2e-recipe.md and .github/workflows/ci-e2e-openshift.yaml). This makes the launcher-based E2E objects diverge from documented/CI configurations and can make failures harder to reproduce. Consider keeping this env var (or updating the docs/workflows and adding a short note explaining why it’s no longer needed).

Copilot uses AI. Check for mistakes.
labels:
Expand All @@ -38,7 +37,6 @@ spec:
options: "--model Qwen/Qwen2.5-0.5B-Instruct"
env_vars:
VLLM_SERVER_DEV_MODE: "1"
VLLM_USE_V1: "1"
VLLM_LOGGING_LEVEL: "DEBUG"
VLLM_CPU_KVCACHE_SPACE: "1" # GiB
Comment on lines 38 to 41
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: removing VLLM_USE_V1 here diverges from the repo’s documented/CI InferenceServerConfig examples. If the intent is to standardize on a default vLLM mode, consider updating the docs/workflows accordingly so users and CI run the same configuration.

Copilot uses AI. Check for mistakes.
labels:
Expand All @@ -59,7 +57,6 @@ spec:
options: "--model TinyLlama/TinyLlama-1.1B-Chat-v1.0"
env_vars:
VLLM_SERVER_DEV_MODE: "1"
VLLM_USE_V1: "1"
VLLM_LOGGING_LEVEL: "DEBUG"
VLLM_CPU_KVCACHE_SPACE: "1" # GiB
Comment on lines 58 to 61
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: VLLM_USE_V1 removal changes the runtime configuration for this test InferenceServerConfig compared to other repo examples. Please either keep it for consistency or update the other references so launcher-based tests match the expected vLLM configuration.

Copilot uses AI. Check for mistakes.
labels:
Expand Down
10 changes: 2 additions & 8 deletions test/e2e/run-launcher-based.sh
Original file line number Diff line number Diff line change
Expand Up @@ -278,12 +278,9 @@ expect '[ "$(kubectl get pod $reqlb3 -o jsonpath={.metadata.labels.dual-pods\\.l
# Verify launcher is bound to new requester
expect '[ "$(kubectl get pod $launcherlb -o jsonpath={.metadata.labels.dual-pods\\.llm-d\\.ai/dual})" == "$reqlb3" ]'

# Verify the new requester is using isc2
expect '[ "$(kubectl get pod $reqlb3 -o jsonpath={.metadata.annotations.dual-pods\\.llm-d\\.ai/inference-server-config})" == "'$isc2'" ]'

# Wait for requester to be ready (launcher should already be ready)
date
kubectl wait --for condition=Ready pod/$reqlb3 --timeout=30s
kubectl wait --for condition=Ready pod/$reqlb3 --timeout=120s
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test no longer verifies that the newly created requester pod is actually using the patched dual-pods.llm-d.ai/inference-server-config value (isc2). Without an assertion on the pod annotation (or an equivalent check proving a 2nd instance was created), this section can pass even if the ReplicaSet patch didn’t take effect, reducing the test’s ability to catch regressions.

Suggested change
kubectl wait --for condition=Ready pod/$reqlb3 --timeout=120s
kubectl wait --for condition=Ready pod/$reqlb3 --timeout=120s
# Verify requester is using the patched inference server config (isc2)
expect '[ "$(kubectl get pod $reqlb3 -o jsonpath={.metadata.annotations.dual-pods\\.llm-d\\.ai/inference-server-config})" == "$isc2" ]'

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silly LLM, why didn't you suggest testing with an if statement?

I do not think that this script needs to check whether the ReplicaSet controller behaved properly.

kubectl wait --for condition=Ready pod/$launcherlb --timeout=5s

cheer Successful multiple instances sharing one launcher
Expand Down Expand Up @@ -321,12 +318,9 @@ expect '[ "$(kubectl get pod $reqlb4 -o jsonpath={.metadata.labels.dual-pods\\.l
# Verify launcher is bound to new requester
expect '[ "$(kubectl get pod $launcherlb -o jsonpath={.metadata.labels.dual-pods\\.llm-d\\.ai/dual})" == "$reqlb4" ]'

# Verify the new requester is using original isc
expect '[ "$(kubectl get pod $reqlb4 -o jsonpath={.metadata.annotations.dual-pods\\.llm-d\\.ai/inference-server-config})" == "'$isc'" ]'

# Wait for requester to be ready (launcher should already be ready)
date
kubectl wait --for condition=Ready pod/$reqlb4 --timeout=30s
kubectl wait --for condition=Ready pod/$reqlb4 --timeout=120s
kubectl wait --for condition=Ready pod/$launcherlb --timeout=5s
Comment on lines 321 to 324
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scenario patches the ReplicaSet back to the original inference server config, but the test no longer asserts that the new requester pod actually has the expected dual-pods.llm-d.ai/inference-server-config annotation (or otherwise proves the first instance was re-selected). Without that, this can become a false-positive if the patch doesn’t apply or the wrong instance is used.

Copilot uses AI. Check for mistakes.

cheer Successful switching instances in one launcher
Expand Down
Loading