Skip to content

Conversation

@dependabot
Copy link

@dependabot dependabot bot commented on behalf of github Feb 5, 2026

Bumps vllm from 0.11.2 to 0.15.1.

Release notes

Sourced from vllm's releases.

v0.15.1 is a patch release with security fixes, RTX Blackwell GPU fixes support, and bug fixes.

Security

Highlights

Bugfix Hardware Support

  • RTX Blackwell (SM120): Fixed NVFP4 MoE kernel support for RTX Blackwell workstation GPUs. Previously, NVFP4 MoE models would fail to load on these GPUs (#33417)
  • FP8 kernel selection: Fixed FP8 CUTLASS group GEMM to properly fall back to Triton kernels on SM120 GPUs (#33285)

Model Support

  • Step-3.5-Flash: New model support (#33523)

Bugfix Model Support

  • Qwen3-VL-Reranker: Fixed model loading (#33298)
  • Whisper: Fixed FlashAttention2 with full CUDA graphs (#33360)

Performance

  • torch.compile cold-start: Fixed regression that increased cold-start compilation time (Llama3-70B: ~88s → ~22s) (#33441)
  • MoE forward pass: Optimized by caching layer name computation (#33184)

Bug Fixes

  • Fixed prefix cache hit rate of 0% with GPT-OSS style hybrid attention models (#33524)
  • Enabled Triton MoE backend for FP8 per-tensor dynamic quantization (#33300)
  • Disabled unsupported Renormalize routing methods for TRTLLM per-tensor FP8 MoE (#33620)
  • Fixed speculative decoding metrics crash when no tokens generated (#33729)
  • Disabled fast MoE cold start optimization with speculative decoding (#33624)
  • Fixed ROCm skinny GEMM dispatch logic (#33366)

Dependencies

  • Pinned LMCache >= v0.3.9 for API compatibility (#33440)

New Contributors 🎉

Full Changelog: vllm-project/vllm@v0.15.0...v0.15.1

v0.15.0

Highlights

This release features 335 commits from 158 contributors (39 new)!

Model Support

  • New architectures: Kimi-K2.5 (#33131), Molmo2 (#30997), Step3vl 10B (#32329), Step1 (#32511), GLM-Lite (#31386), Eagle2.5-8B VLM (#32456).
  • LoRA expansion: Nemotron-H (#30802), InternVL2 (#32397), MiniMax M2 (#32763).
  • Speculative decoding: EAGLE3 for Pixtral/LlavaForConditionalGeneration (#32542), Qwen3 VL MoE (#32048), draft model support (#24322).
  • Embeddings: BGE-M3 sparse embeddings and ColBERT embeddings (#14526).

... (truncated)

Commits
  • 1892993 [BugFix][Spec Decoding] Fix negative accepted tokens metric crash (#33729)
  • 7d98f09 cherry pick
  • daa2784 [Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM per-...
  • e4bf6ed [torch.compile] Don't do the fast moe cold start optimization if there is spe...
  • 611b187 [torch.compile] Speed up MOE handling in forward_context (#33184)
  • eec3546 [Misc][Build] Lazy load cv2 in nemotron_parse.py (#33189)
  • 7c023ba Patch Protobuf for CVE 2026-0994 (#33619)
  • 099a787 Patch aiohttp for CVE-2025-69223 (#33621)
  • 31a64c6 [Release] Fix format and cherry-pick (#33618)
  • 57eae2f [Release] patch step3p5 attention class in v0.15.1 release (#33602)
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [vllm](https://github.com/vllm-project/vllm) from 0.11.2 to 0.15.1.
- [Release notes](https://github.com/vllm-project/vllm/releases)
- [Changelog](https://github.com/vllm-project/vllm/blob/main/RELEASE.md)
- [Commits](vllm-project/vllm@v0.11.2...v0.15.1)

---
updated-dependencies:
- dependency-name: vllm
  dependency-version: 0.15.1
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants