Skip to content

[Parallelism Support Matrix Tests] Replace flaky EP relative comparison with hardcoded absolute baseline#1984

Open
syhuang22 wants to merge 1 commit intovllm-project:mainfrom
syhuang22:fix/ep-test-absolute-baseline
Open

[Parallelism Support Matrix Tests] Replace flaky EP relative comparison with hardcoded absolute baseline#1984
syhuang22 wants to merge 1 commit intovllm-project:mainfrom
syhuang22:fix/ep-test-absolute-baseline

Conversation

@syhuang22
Copy link
Copy Markdown
Collaborator

@syhuang22 syhuang22 commented Mar 20, 2026

Description

The previous approach was unreliable because
any upstream vllm change could shift the baseline, causing spurious
failures unrelated to EP performance.

Now the tests compare EP inference time against hardcoded baselines measured on
TPU v7x-8 (512 prompts, Fused: 3.40s, GMM: 2.07s). Tests fail only if
regression exceeds 15%, making them stable and independent of non-EP code paths.

Tests

Both tests verified on TPU v7x-8:

  • test_ep_fused_performance: 3.35s (baseline 3.40s, -1.5%) — PASSED
  • test_ep_gmm_performance: 2.21s (baseline 2.07s, +6.6%) — PASSED
python -m pytest tests/e2e/test_expert_parallel.py -v -s

# Checklist

Before submitting this PR, please make sure:
- I have performed a self-review of my code.
- I have necessary comments in my code, particularly in hard-to-understand areas.
- I have made or will make corresponding changes to any relevant documentation.

@syhuang22 syhuang22 requested a review from vipannalla as a code owner March 20, 2026 17:36
@syhuang22 syhuang22 self-assigned this Mar 20, 2026
@syhuang22 syhuang22 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 20, 2026
Signed-off-by: Shy Huang <shyhuang@google.com>
@syhuang22 syhuang22 force-pushed the fix/ep-test-absolute-baseline branch from e500d4b to e4f04cd Compare March 20, 2026 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant