[WIP] Add: Acceptance length tests for speculators by rahul-tuli · Pull Request #136 · neuralmagic/vllm

rahul-tuli · 2026-01-08T15:23:26Z

Adds parameterized pytest tests to detect acceptance length regressions in EAGLE3 speculative decoding. These tests ensure that new commits do not degrade speculative decoding performance.

Changes

tests/v1/spec_decode/test_acceptance_length.py: New test file with parameterized tests for EAGLE3 model pairs

Models Tested

Verifier	Drafter
`meta-llama/Llama-3.1-8B-Instruct`	`RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3`
`Qwen/Qwen3-8B`	`RedHatAI/Qwen3-8B-speculator.eagle3`
`openai/gpt-oss-20b`	`RedHatAI/gpt-oss-20b-speculator.eagle3`

Test Design

Uses philschmid/mt-bench dataset (80 prompts)
Runs inference with 3 speculative tokens
Extracts acceptance length via llm.get_metrics()
Asserts within 2% relative tolerance of expected baseline
Tests are parameterized for easy addition of new model configurations

Test Plan

Run baseline commands to determine expected acceptance lengths
Update EAGLE3_MODEL_CONFIGS with baseline values
Run tests: CUDA_VISIBLE_DEVICES=3 pytest tests/v1/spec_decode/test_acceptance_length.py -v -s
Verify all tests pass within tolerance

Usage

# Run all acceptance length tests
CUDA_VISIBLE_DEVICES=3 pytest tests/v1/spec_decode/test_acceptance_length.py -v -s

# Run specific model
CUDA_VISIBLE_DEVICES=3 pytest tests/v1/spec_decode/test_acceptance_length.py -v -s -k "llama3"

…n validation Add parameterized pytest tests to detect acceptance length regressions in EAGLE3 speculative decoding. Tests run inference on MT-Bench dataset (80 prompts) and assert both mean and per-position acceptance lengths are within 2% tolerance of baseline. Models tested: - Llama-3.1-8B-Instruct (AL: 2.60) - Qwen3-8B (AL: 2.26) - GPT-OSS-20B (AL: 2.56) Signed-off-by: rahul-tuli <rtuli@redhat.com>

Signed-off-by: rahul-tuli <rtuli@redhat.com>

- Use VllmRunner context manager instead of direct LLM instantiation - Use monkeypatch.context() for proper env var scoping - Use AcceptanceMetrics TypedDict in return statement - Remove docstrings from TypedDict and dataclass definitions - Remove inline comments from constants - Remove prototyping skip condition (all configs have baselines) - Fix gpt-oss-20b expected position 2 value (0.3220 -> 0.3337) Signed-off-by: rahul-tuli <rtuli@redhat.com>

Signed-off-by: rahul-tuli <rtuli@redhat.com>

Signed-off-by: yisheng <yi.sheng@intel.com>

vllm-project#32603) Signed-off-by: linhaifeng <1371675203@qq.com>

Signed-off-by: whx-sjtu <2952154980@qq.com>

rahul-tuli force-pushed the add-acceptance-length-tests branch 3 times, most recently from 1f8b8e0 to d425f1c Compare January 9, 2026 14:42

rahul-tuli force-pushed the add-acceptance-length-tests branch 2 times, most recently from 977d295 to 2e12d39 Compare January 19, 2026 14:56

rahul-tuli added 7 commits January 19, 2026 15:00

Update: AL values

f91a4d5

Signed-off-by: rahul-tuli <rtuli@redhat.com>

Some more cleanups

2c5d505

Signed-off-by: rahul-tuli <rtuli@redhat.com>

Review comments

85f67f4

Signed-off-by: rahul-tuli <rtuli@redhat.com>

Added: multiple tp and attention backends

ec01d6c

Signed-off-by: rahul-tuli <rtuli@redhat.com>

Cleanups

5db9a9a

Signed-off-by: rahul-tuli <rtuli@redhat.com>

rahul-tuli force-pushed the add-acceptance-length-tests branch from 2e12d39 to 5db9a9a Compare January 19, 2026 15:00

rahul-tuli and others added 6 commits January 19, 2026 15:01

Cleanups

9803cc7

Signed-off-by: rahul-tuli <rtuli@redhat.com>

Merge branch 'main' into add-acceptance-length-tests

fcbe63d

[XPU]Support AgRsAll2AllManager on XPU device (vllm-project#32654)

13f6630

Signed-off-by: yisheng <yi.sheng@intel.com>

[Bugfix] Fix Off-by-one error in _num_tokens_to_min_blocks calculation (

7901109

vllm-project#32603) Signed-off-by: linhaifeng <1371675203@qq.com>

[PluggableLayer][1/N] Define PluggableLayer (vllm-project#32331)

4ca62a0

Signed-off-by: whx-sjtu <2952154980@qq.com>

Merge branch 'main' into add-acceptance-length-tests

9e92adb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Add: Acceptance length tests for speculators#136

[WIP] Add: Acceptance length tests for speculators#136
rahul-tuli wants to merge 13 commits into
mainfrom
add-acceptance-length-tests

rahul-tuli commented Jan 8, 2026 •

edited by github-actions Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

rahul-tuli commented Jan 8, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Models Tested

Test Design

Test Plan

Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rahul-tuli commented Jan 8, 2026 •

edited by github-actions Bot

Loading