[Parallelism Support Matrix Tests] Replace flaky EP relative comparison with hardcoded absolute baseline by syhuang22 · Pull Request #1984 · vllm-project/tpu-inference

syhuang22 · 2026-03-20T17:36:48Z

Description

The previous approach was unreliable because
any upstream vllm change could shift the baseline, causing spurious
failures unrelated to EP performance.

Now the tests compare EP inference time against hardcoded baselines measured on
TPU v7x-8 (512 prompts, Fused: 3.40s, GMM: 2.07s). Tests fail only if
regression exceeds 15%, making them stable and independent of non-EP code paths.

Tests

Both tests verified on TPU v7x-8:

test_ep_fused_performance: 3.35s (baseline 3.40s, -1.5%) — PASSED
test_ep_gmm_performance: 2.21s (baseline 2.07s, +6.6%) — PASSED

python -m pytest tests/e2e/test_expert_parallel.py -v -s

# Checklist

Before submitting this PR, please make sure:
- I have performed a self-review of my code.
- I have necessary comments in my code, particularly in hard-to-understand areas.
- I have made or will make corresponding changes to any relevant documentation.

Signed-off-by: Shy Huang <shyhuang@google.com>

syhuang22 requested a review from vipannalla as a code owner March 20, 2026 17:36

syhuang22 self-assigned this Mar 20, 2026

syhuang22 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 20, 2026

Replace flaky EP relative comparison with hardcoded absolute baseline

e4f04cd

Signed-off-by: Shy Huang <shyhuang@google.com>

syhuang22 force-pushed the fix/ep-test-absolute-baseline branch from e500d4b to e4f04cd Compare March 20, 2026 17:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Parallelism Support Matrix Tests] Replace flaky EP relative comparison with hardcoded absolute baseline#1984

[Parallelism Support Matrix Tests] Replace flaky EP relative comparison with hardcoded absolute baseline#1984
syhuang22 wants to merge 1 commit intovllm-project:mainfrom
syhuang22:fix/ep-test-absolute-baseline

syhuang22 commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

syhuang22 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

syhuang22 commented Mar 20, 2026 •

edited

Loading