Skip to content

[Spec Decoding] Add DFlash e2e tests and Buildkite CI#1870

Open
aaronzhfeng wants to merge 1 commit intovllm-project:mainfrom
aaronzhfeng:pr_dflash_1c
Open

[Spec Decoding] Add DFlash e2e tests and Buildkite CI#1870
aaronzhfeng wants to merge 1 commit intovllm-project:mainfrom
aaronzhfeng:pr_dflash_1c

Conversation

@aaronzhfeng
Copy link

Description

Add e2e tests and Buildkite CI for DFlash block-diffusion speculative decoding. The DFlash model/proposer were added in #1868, and pipeline integration in #1869. This PR adds the test coverage and CI.

Verified on both TPU v4 and v5p across 9 datasets (math, code, chat) with Qwen3-4B target + z-lab/Qwen3-4B-DFlash-b16 draft, achieving 3x average speedup.

Files:

  • tests/e2e/test_speculative_decoding.py -- add test_dflash_correctness (Qwen3-4B + DFlash draft, output correctness) and test_dflash_performance (1.5x speedup threshold)
  • .buildkite/features/Speculative_Decoding-_DFlash.yml -- Buildkite CI pipeline for DFlash correctness and performance, modeled after Eagle3's Speculative_Decoding-_Eagle3.yml

Tests

pytest tests/e2e/test_speculative_decoding.py::test_dflash_correctness
pytest tests/e2e/test_speculative_decoding.py::test_dflash_performance

Checklist

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have made or will make corresponding changes to any relevant documentation.

Signed-off-by: aaronzhfeng <fzx333578@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant