Releases: vllm-project/speculators
Releases · vllm-project/speculators
Speculators v0.2.0
Speculators v0.2.0 Release Notes
This Speculators v0.2.0 release introduces the following new features and enhancements:
- Support for Draft Models with Multiple Decoder Layers: Previously, only draft models with a single decoder layer were supported. The Eagle3 converter now sets the num_hidden_layers from the config instead of always assuming one layer.
 - Added Support for eagle_aux_hidden_state_layer_ids Argument: This new argument allows users to toggle the layer IDs of the hidden state layers that are fetched during inference time. This enables support for converting Llama4 Maverick draft models to the Speculators format and running the converted model in vLLM.
 
Updates and Deprecations:
- Python 3.9 Support Removed: Support for Python 3.9 has been removed and will no longer be provided. Python 3.10+ will be supported going forward.
 - Default Number of Speculative Tokens Changed: The default number of speculative tokens has been changed from 5 to 3 for all Eagle and Eagle3 models.
 - Override tie_weights() in Eagle3Speculator: This override prevents vocabulary corruption and supports Transformers 4.54.1.
 - Updated head_dim Calculation in Eagle3 Converter: The head_dim value is now used from the config if provided; otherwise, it is calculated using the formula hidden_size // num_heads.
 - Eagle3 Draft Models Retain Original Dtype: All Eagle3 draft models now keep their original dtype after being converted to the Speculators format. Previously, all converted draft models were cast to FP32.
 - Extended Logic for target_vocab_size: The system defaults to using the "t2d" length, but if not available recursively search the verifier model's config file for vocab_size.
 - Full End-to-End vLLM Smoke Testing: Extended and added full end-to-end vLLM smoke testing for both converted and unconverted models.
 
Full Change Log
- Update README install commands now that Speculators is live on PyPi by @markurtz in #89
 - override transformer tie_weights to prevent shape mismatch by @shanjiaz in #74
 - [Testing][vLLM] Add vLLM Eagle3 Test Cases by @dsikka in #91
 - Adding .readthedocs.yaml by @aireilly in #92
 - [Tests][Eagle3] Extend vLLM test cases with conversion step by @dsikka in #93
 - Model architectures by @anmarques in #90
 - Fix type annotation override in SpeculatorModel.generate method by @rahul-tuli in #111
 - Update mkdocs by @aireilly in #115
 - Update README.md with badges by @dsikka in #108
 - Update ReadME feature content by @dsikka in #109
 - Fix broken links by @aireilly in #125
 - Update README with new models and their links by @eldarkurtic in #135
 - Fix for Eagle attention arch when head_dim is given in config.json by @eldarkurtic in #134
 - Fix for draft models always being in fp32 datatype by @eldarkurtic in #136
 - Fix install command for dev by @eldarkurtic in #137
 - Fix 'test_download_with_cache_dir' by @dbarbuzzi in #141
 - Update link checker so that it comments on existing issue by @fynnsu in #129
 - Prevent forced casting to fp16 dtype by @eldarkurtic in #145
 - Set default num of spec tokens to 3 by @eldarkurtic in #146
 - Update speculator config & converter to support hidden states indexing by @shanjiaz in #142
 - add num_hidden_layers by @shanjiaz in #147
 - Update CI Testing by @dsikka in #150
 - added loading util for specific layers by @shanjiaz in #144
 - Remove PyPI publishing steps from nightly workflow by @dsikka in #151
 - Refactor e2e tests to support external vLLM by @dbarbuzzi in #153
 - Remove remaining python 3.9 usages by @fynnsu in #152
 - Fix a typo in docs by @eldarkurtic in #107
 - Added loading util tests by @shanjiaz in #155
 - Extend E2E Tests for EAGLE3 Models by @rahul-tuli in #156
 - Remove nightly in favour of testing repo by @dsikka in #159
 - Remove nightly tests badge from README by @fynnsu in #163
 - add back link-checks by @dhuangnm in #162
 - Fix dev link checker workflow to comment directly on PRs by @markurtz in #164
 - Only load Verifier model if attachment_mode is 'full' by @fynnsu in #154
 - Fix EAGLE3 vLLM tests by disabling torch compile cache by @rahul-tuli in #166
 - bump up version for last release by @dhuangnm in #167
 
New Contributors
- @aireilly made their first contribution in #92
 - @anmarques made their first contribution in #90
 - @eldarkurtic made their first contribution in #135
 - @dbarbuzzi made their first contribution in #141
 - @dhuangnm made their first contribution in #162
 
Full Changelog: v0.1.0...v0.2.0
Speculators v0.1.0 -- First Public Release
Overview
This first public release publishes the complete initial codebase for Speculators — a unified library for building, evaluating, converting, and serving speculative decoding algorithms for LLMs. It delivers the core framework, CI/CD and developer workflow, model/config implementations (EAGLE v1/HASS/EAGLE‑3), converter CLIs from external research repos, a Hugging Face–compatible model format with vLLM serving support, and prototype training code.
What’s New (Highlights)
- Unified, extensible framework for speculator models (build, evaluate, convert, store)
 - Hugging Face–compatible speculator format with serving support landed in vLLM
 - Models/configs for EAGLE v1 (HASS-style), HASS, and EAGLE‑3 (multi-layer types)
 - Checkpoint converter CLIs (Eagle, Eagle‑3) from external research repositories
 - Prototype training code and scripts (EAGLE‑1-style drafter, HASS) + requirements
 - Production readiness: CI/CD, tests, style, docs, examples, and benchmarks
 
Use Cases Enabled
- Register and configure new speculator algorithms via a standardized configuration and registry system
 - Convert external checkpoints (EAGLE/EAGLE‑3/HASS variants) into the Speculators format with CLI tools
 - Serve Speculators models directly in vLLM for low‑latency inference
 - Evaluate and benchmark speculators (e.g., with GuideLLM), including quantized verifier swaps
 - Prototype‑train drafters using provided research code and scripts
 
Getting Started
- Install (Python 3.9–3.13 on Linux or macOS):
pip install git+https://github.com/neuralmagic/speculators.git
 - Serve with vLLM (requires v1 API):
VLLM_USE_V1=1 vllm serve RedHatAI/Qwen3-8B-speculator.eagle3
 - Explore examples and research: 
examples/,research/eagle3/,research/hass/ 
Compatibility Notes
- Python: 3.9–3.13
 - OS: Linux and macOS
 - Transformers pinned to avoid mypy regressions (PR #73)
 - vLLM v1 API required for serving (set 
VLLM_USE_V1=1) 
Full Changelog (v0.1.0)
First public release of Speculators. This release publishes the complete initial codebase and enables the first set of core use cases for speculative decoding with LLMs.
Added
- Base configuration and registry system with tests: Speculator, Token Proposal, and Model Speculator configs; 
EagleSpeculatorConfigfor EAGLE v1/HASS; config serialization/loading (PRs #26, #27, #28, #29, #34, #36) - Eagle speculator model and support for multiple transformer layer types (PRs #37, #49)
 - Eagle‑3 speculator model and Qwen support (PRs #50, #55)
 - Checkpoint converter CLIs: Eagle and Eagle‑3; standardized converter interface (PRs #39, #53, #72)
 - vLLM serving documentation and Qwen benchmark assets (PRs #77, #78, #82, #83)
 - Examples directory and README for getting started (PR #81)
 - Branding assets (icons, logos, user‑flow diagrams) (PR #87)
 
Changed
- Standardized converter CLI UX and flags (PR #72)
 - Documentation/readme formatting and content updates (PRs #70, #75, #83, #85)
 
Fixed
- Missing embeddings in converted checkpoints/workflows (PR #65)
 - CLI flags and 
norm_before_residualtoggle (PRs #57, #58) - Compatibility: pin 
transformersto resolve mypy/typing regressions (PR #73) 
CI/CD and Tooling
- GitHub Actions: migrated link checks to lychee and updated workflows (PRs #3, #45)
 - PR comment behavior refinements (PR #47)
 
Research and Training
- Training code for EAGLE‑1‑style drafter with multi‑step training (PR #35)
 - HASS/EAGLE‑3 research updates, requirements, and DeepSpeed dependency (PRs #64, #67, #69)
 
Documentation
- vLLM serving instructions, Qwen benchmark results, examples README, and research readmes (PRs #64, #70, #77, #78, #81, #83, #85)
 
New Contributors
- @fynnsu made their first contribution in PR #47
 - @shanjiaz made their first contribution in PR #53
 - @MeganEFlynn made their first contribution in PR #55
 
Thanks also to continuing contributors: @markurtz, @rahul-tuli, @dsikka
Links
- Compare changes: v0.0.1...v0.1.0