Skip to content

v0.12.0rc1

Pre-release
Pre-release

Choose a tag to compare

@david6666666 david6666666 released this 05 Jan 11:17
e7eeb54

vLLM-Omni v0.12.0rc1 Pre-Release Notes Highlights

Highlights

This release features 187 commits from 45 contributors (34 new contributors)!

vLLM-Omni v0.12.0rc1 is a major RC milestone focused on maturing the diffusion stack, strengthening OpenAI-compatible serving, expanding omni-model coverage, and improving stability across platforms (GPU/NPU/ROCm). It also rebases on vLLM v0.12.0 for better alignment with upstream (#335).

Breaking / Notable Changes

  • Unified diffusion stage naming & structure: cleaned up legacy Diffusion* paths and aligned on Generation*-style stages to reduce duplication (#211, #163).
  • Safer serialization: switched OmniSerializer from pickle to MsgPack (#310).
  • Dependency & packaging updates: e.g., bumped diffusers to 0.36.0 (#313) and refreshed Python/formatting baselines for the v0.12 release (#126).

Diffusion Engine: Architecture + Performance Upgrades

  • Core refactors for extensibility: diffusion model registry refactored to reuse vLLM’s ModelRegistry (#200), improved diffusion weight loading and stage abstraction (#157, #391).

  • Acceleration & parallelism features:

    • Cache-DiT with a unified cache backend interface (#250)
    • TeaCache integration and registry refactors (#179, #304, #416)
    • New/extended attention & parallelism options: Sage Attention (#243), Ulysses Sequence Parallelism (#189), Ring Attention (#273)
    • torch.compile optimizations for DiT and RoPE kernels (#317)

Serving: Stronger OpenAI Compatibility & Online Readiness

  • DALL·E-compatible image generation endpoint (/v1/images/generations) (#292), plus online serving fixes for image generation (#499).
  • Added OpenAI create speech endpoint (#305).
  • Per-request modality control (output modality selection) (#298) with API usage examples (#411).
  • Early support for streaming output (#367), request abort (#486), and request-id propagation in responses (#301).

Omni Pipeline: Multi-stage Orchestration & Observability

  • Improved inter-stage plumbing: customizable process between stages and reduced coupling on request_ids in model forward paths (#458).
  • Better observability and debugging: torch profiler across omni stages (#553), improved traceback reporting from background workers (#385), and logging refactors (#466).

Expanded Model Support (Selected)

  • Qwen-Omni / Qwen-Image family:

    • Qwen-Omni offline inference with local files (#167)
    • Qwen-Image-2512 support(#547)
    • Qwen-Image-Edit support (including multi-image input variants and newer releases, Qwen-Image-Edit Qwen-Image-Edit-2509 Qwen-Image-Edit-2511) (#196, #330, #321)
    • Qwen-Image-Layered model support (#381)
    • Multiple fixes for Qwen2.5/Qwen3-Omni batching, examples, and OpenAI sampling parameter compatibility (#451, #450, #249)
  • Diffusion / video ecosystem:

    • Z-Image support and kernel fusions (#149, #226)
    • Stable Diffusion 3 support (#439)
    • Wan2.2 T2V plus I2V/TI2V pipelines (#202, #329)
    • LongCat-Image and LongCat-Image-Edit support (#291, #392)
    • Ovis Image model addition (#263)
    • Bagel (diffusion-only) and image-edit support (#319, #588)

Platform & CI Coverage

  • ROCm / AMD: documented ROCm setup (#144) and added ROCm Dockerfile + AMD CI (#280).
  • NPU: added NPU CI workflow (#231) and expanded NPU support for key Omni models (e.g., Qwen3-Omni, Qwen-Image series) (#484, #463, #485), with ongoing cleanup of NPU-specific paths (#597).
  • CI and packaging improvements: diffusion CI, wheel compilation, and broader UT/E2E coverage (#174, #288, #216, #168).

What's Changed

New Contributors

Full Changelog: v0.11.0rc1...v0.12.0rc1