v0.12.0rc1
Pre-release
Pre-release
vLLM-Omni v0.12.0rc1 Pre-Release Notes Highlights
Highlights
This release features 187 commits from 45 contributors (34 new contributors)!
vLLM-Omni v0.12.0rc1 is a major RC milestone focused on maturing the diffusion stack, strengthening OpenAI-compatible serving, expanding omni-model coverage, and improving stability across platforms (GPU/NPU/ROCm). It also rebases on vLLM v0.12.0 for better alignment with upstream (#335).
Breaking / Notable Changes
- Unified diffusion stage naming & structure: cleaned up legacy
Diffusion*paths and aligned onGeneration*-style stages to reduce duplication (#211, #163). - Safer serialization: switched
OmniSerializerfrompickleto MsgPack (#310). - Dependency & packaging updates: e.g., bumped
diffusersto 0.36.0 (#313) and refreshed Python/formatting baselines for the v0.12 release (#126).
Diffusion Engine: Architecture + Performance Upgrades
-
Core refactors for extensibility: diffusion model registry refactored to reuse vLLM’s
ModelRegistry(#200), improved diffusion weight loading and stage abstraction (#157, #391). -
Acceleration & parallelism features:
- Cache-DiT with a unified cache backend interface (#250)
- TeaCache integration and registry refactors (#179, #304, #416)
- New/extended attention & parallelism options: Sage Attention (#243), Ulysses Sequence Parallelism (#189), Ring Attention (#273)
- torch.compile optimizations for DiT and RoPE kernels (#317)
Serving: Stronger OpenAI Compatibility & Online Readiness
- DALL·E-compatible image generation endpoint (
/v1/images/generations) (#292), plus online serving fixes for image generation (#499). - Added OpenAI create speech endpoint (#305).
- Per-request modality control (output modality selection) (#298) with API usage examples (#411).
- Early support for streaming output (#367), request abort (#486), and request-id propagation in responses (#301).
Omni Pipeline: Multi-stage Orchestration & Observability
- Improved inter-stage plumbing: customizable process between stages and reduced coupling on
request_idsin model forward paths (#458). - Better observability and debugging: torch profiler across omni stages (#553), improved traceback reporting from background workers (#385), and logging refactors (#466).
Expanded Model Support (Selected)
-
Qwen-Omni / Qwen-Image family:
- Qwen-Omni offline inference with local files (#167)
- Qwen-Image-2512 support(#547)
- Qwen-Image-Edit support (including multi-image input variants and newer releases, Qwen-Image-Edit Qwen-Image-Edit-2509 Qwen-Image-Edit-2511) (#196, #330, #321)
- Qwen-Image-Layered model support (#381)
- Multiple fixes for Qwen2.5/Qwen3-Omni batching, examples, and OpenAI sampling parameter compatibility (#451, #450, #249)
-
Diffusion / video ecosystem:
Platform & CI Coverage
- ROCm / AMD: documented ROCm setup (#144) and added ROCm Dockerfile + AMD CI (#280).
- NPU: added NPU CI workflow (#231) and expanded NPU support for key Omni models (e.g., Qwen3-Omni, Qwen-Image series) (#484, #463, #485), with ongoing cleanup of NPU-specific paths (#597).
- CI and packaging improvements: diffusion CI, wheel compilation, and broader UT/E2E coverage (#174, #288, #216, #168).
What's Changed
- [Misc] Update link in issue template by @ywang96 in #155
- [Misc] Qwen-Omni support offline inference with local files by @SamitHuang in #167
- [diffusion] z-image support by @ZJY0516 in #149
- [Doc] Fix wrong examples URLs by @wjcwjc77 in #166
- [Doc] Update Security Advisory link by @DarkLight1337 in #173
- [Doc] change
vllm_omnitovllm-omniby @princepride in #177 - [Docs] Supplement volunteers and faq docs by @Gaohan123 in #182
- [Bugfix] Init early toch cuda by @knlnguyen1802 in #185
- [Docs] remove Ascend word to make docs general by @gcanlin in #190
- [Doc] Add installation part for pre built docker. by @congw729 in #141
- [CI] add diffusion ci by @ZJY0516 in #174
- [Misc] Add stage config for Qwen3-Omni-30B-A3B-Thinking by @linyueqian in #172
- [Doc]Fixed some spelling errors by @princepride in #199
- [Chore]: Refactor diffusion model registry to reuse vLLM's ModelRegistry by @Isotr0py in #200
- [FixBug]online serving fails for high-resolution videos by @princepride in #198
- [Engine] Remove Diffusion_XX which duplicates with Generation_XX by @tzhouam in #163
- [bugfix] qwen2.5 omni does not support chunked prefill now by @fake0fan in #193
- [NPU][Refactor] Rename Diffusion* to Generation* by @gcanlin in #211
- [Diffusion] Init Attention Backends and Selector for Diffusion by @ZJY0516 in #115
- [E2E] Add Qwen2.5-Omni model test with OmniRunner by @gcanlin in #168
- [Docs]Fix doc wrong link by @princepride in #223
- [Diffusion] Refactor diffusion models weights loading by @Isotr0py in #157
- Fix: Safe handling for multimodal_config to avoid 'NoneType' object h… by @qibaoyuan in #227
- [Bugfix] Fix ci bug for qwen2.5-omni by @Gaohan123 in #230
- [Core] add clean up method for diffusion engine by @ZJY0516 in #219
- [BugFix] Fix qwen3omni thinker batching. by @yinpeiqi in #207
- [Bugfix] Support passing vllm cli args to online serving in vLLM-Omni by @Gaohan123 in #206
- [Docs] Add basic usage examples for diffusion by @SamitHuang in #222
- [Model] Add Qwen-Image-Edit by @SamitHuang in #196
- update docs/readme.md and design folder by @hsliuustc0106 in #234
- [CI] Add Qwen3-omni offline UT by @R2-Y in #216
- [typo] fix doc readme by @hsliuustc0106 in #242
- [Model] Fuse Z-Image's
qkv_projandgate_up_projby @Isotr0py in #226 - [bugfix] Fix QwenImageEditPipeline transformer init by @dougbtv in #245
- [Bugfix] Qwen2.5-omni Qwen3-omni online gradio.py example fix by @david6666666 in #249
- [Bugfix] fix issue251, qwen3 omni does not support chunked prefill now by @david6666666 in #256
- [Bugfix]multi-GPU tp scenarios, devices: "0,1" uses physical IDs instead of logical IDs by @david6666666 in #253
- [Bugfix] Remove debug code in AsyncOmni.del to fix resource leak by @princepride in #260
- update arch overview by @hsliuustc0106 in #258
- [Feature] Omni Connector + ray supported by @natureofnature in #215
- [Misc] fix stage config describe and yaml format by @david6666666 in #265
- update desgin docs by @hsliuustc0106 in #269
- [Model] Add Wan2.2 text-to-video support by @linyueqian in #202
- [Doc] [ROCm]: Document the steps to run vLLM Omni on ROCm by @tjtanaa in #144
- [Entrypoints] Minor optimization in the orchestrator's final stage determination logic by @RuixiangMa in #275
- [Doc] update offline inference doc and offline_inference examples by @david6666666 in #274
- [Feature] teacache integration by @LawJarp-A in #179
- [CI] Qwen3-Omni online test by @R2-Y in #257
- [Doc] fix docs Feature Design and Module Design by @hsliuustc0106 in #283
- [CI] Test ready label by @ywang96 in #299
- [Doc] fix offline inference and online serving describe by @david6666666 in #285
- [CI] Adjust folder by @congw729 in #300
- [Diffusion][Attention] sage attention backend by @ZJY0516 in #243
- [BugFix]Remove duplicate code by @princepride in #309
- [Misc] move omni_diffusion.py to vllm_omni.entrypoints by @ZJY0516 in #312
- [CI] Test file requirements (dir sturcture & coding style). by @congw729 in #270
- [Diffusion] Add cache-dit and unify diffusion cache backend interface by @SamitHuang in #250
- [BugFix] Failed to print DEBUG level log by @Bounty-hunter in #307
- [Misc] Bump
diffusersto 0.36.0 by @ZJY0516 in #313 - [Misc] Fix docs and update image edit example with cache-dit by @SamitHuang in #318
- [Misc] fix yaml config by @RuixiangMa in #320
- [Chore] Bump Python version and apply formatting changes for v0.12 release by @DarkLight1337 in #126
- [Diffusion] fix warning in diffusion stage by @RuixiangMa in #332
- [Model] Support Qwen-Image-Edit 2511 by @SamitHuang in #321
- [Feature] Send response with request id by @Bounty-hunter in #301
- [Bugfix] [Example] Add fallback to
worker_backendandray_addressinAsyncOmnito fix gradio demos by @tjtanaa in #337 - [Diffusion]: Diffusion Ulysses-Sequence-Parallelism support by @wtomin in #189
- [NPU][CI] Add the CI workflow on NPU by @gcanlin in #231
- [Feature] Support output modalities control per request by @Gaohan123 in #298
- [Entrypoints] Support Online Serving for Diffusion-only Models by @fake0fan in #259
- [Misc] Fix left comments for PR298 by @Gaohan123 in #346
- [Benchmark] Benchmark Running Samples for Qwen3 Omni and Dataset Preparation by @tzhouam in #212
- [Chore]: Clean up
pyproject.tomland cache-dit backend by @Isotr0py in #351 - [Feature] Enable TeaCache in QwenImageEditPipeline by @yuanheng-zhao in #304
- [CI]Wheel package compilation by @princepride in #288
- [Model] Support Qwen-Image-Edit 2509 ( multi-image input edit) by @SamitHuang in #330
- [Model] Add LongCat-Image support by @e1ijah1 in #291
- [Doc] Rewrite checklist & update docs. by @congw729 in #360
- [Perf] torch compile for dit and rope kernel by @ZJY0516 in #317
- [Entrypoints] online serving support usp args by @david6666666 in #366
- [Rebase] Rebase to vllm 0.12.0 by @tzhouam in #335
- [Entrypoints] Support Qwen-Image-Edit 2509 online serving & update doc by @david6666666 in #368
- [Misc] Update qwen-omni gradio demo to use api server by @SamitHuang in #378
- [Bugfix] fix GPU VRAM calculation problem by @R2-Y in #328
- [Revert] revert diffusion warmup by @ZJY0516 in #382
- [Model] Ovis Image Model Addition by @divyanshsinghvi in #263
- [Bugfix] Fix image size logging bug for multiple image input by @SamitHuang in #384
- [New model] Support model qwen image layered by @Bounty-hunter in #381
- [ROCm] [CI] Add ROCm Dockerfile and AMD CI (vllm v0.12.0) by @tjtanaa in #280
- [Bugfix] Enable teacahce in QwenImageEditPlusPipeline by @yuanheng-zhao in #379
- [Doc] fix wrong file name in Qwen-Image-Layered demo script by @tjtanaa in #396
- [refactor] teacache refactor by @ZJY0516 in #395
- [Security][Feature] Use Msgpack in OmniSerializer instead of pickle by @gcanlin in #310
- [Feature]: Adding traceback for error in background worker by @divyanshsinghvi in #385
- Bump version to 0.12.0rc1 by @congw729 in #399
- bug resolution for omni models registration not present in vllm by @divyanshsinghvi in #397
- [Diffusion][Bug-Fix]: Fix Ulysses SP Accurary Error by @wtomin in #377
- [CI] Skip build wheel CI test for specific type files. by @congw729 in #401
- [Doc] Refine the document. by @congw729 in #403
- [Model] Add LongCat-Image-Edit support by @e1ijah1 in #392
- [BugFix] raise error when request.stream=True. by @yinpeiqi in #350
- DALL-E compatible image generation endpoint by @dougbtv in #292
- [Doc] update diffusion_acceleration.md & FAQ memory troubleshooting by @david6666666 in #406
- [Doc] fix diagrams in Architecture Overview page by @fhfuih in #430
- [RFC] Clean and refactor TeaCache registry by @yuanheng-zhao in #416
- [Docs] Pin the installation on NPU as v0.11.0rc1 by @gcanlin in #434
- [Misc] Add Qwen-Image-Edit-2511 running script by @SamitHuang in #440
- [Docs] how to add a new multi stage model document by @R2-Y in #417
- [Docs] fix doc image loading error by @R2-Y in #446
- [Docs] Update installation method. by @congw729 in #448
- [Docs] Add API usage examples for modality control in online serving by @wonjerry in #411
- [BugFix] Qwen3-Omni / Qwen2.5-Omni thinker batching with audio input by @yinpeiqi in #451
- [Feature] Support standard OpenAI API sampling parameters for thinker stage. (max_tokens...) by @wonjerry in #450
- [Core] customize process between stages, remove the need of "request_ids" in model forward, rebase to 0.12.0 by @yinpeiqi in #458
- [BugFix] Use weakref.finalize to cleanup DiffusionEngine by @iwzbi in #461
- [Model] Add Wan2.2 I2V and TI2V pipeline support by @linyueqian in #329
- [Feature] Support Qwen Omni online batch inference by @ZeldaHuang in #438
- RPC support for OmniDiffusion by @knlnguyen1802 in #371
- [Clean] Remove the redundant decoding payloads logic by @gcanlin in #404
- [Benchmark] DiT models Performance benchmark(T2I/I2I/T2V/TI2V) by @david6666666 in #362
- [Perf] Optimise tensor concat perf in output processor by @wuhang2014 in #467
- [Feature] logger refactor by @Bounty-hunter in #466
- [Refactor][Diffusion] Make NPUWorker inherit GPUWorker by @gcanlin in #483
- [BugFix] initial sampling_params_list when offline inference by @LJH-LBJ in #468
- [NPU][Upgrade] Upgrade to v0.12.0 by @gcanlin in #447
- upload collect_env.py by @tukwila in #429
- [Misc] Fix type interface mismatch by @divyanshsinghvi in #478
- [NPU][Model] Support Qwen3-Omni on NPU by @gcanlin in #484
- fix ci by @ZJY0516 in #489
- [Core] Supports stage abstraction in the diffusion model by @fake0fan in #391
- [BugFix] Make max_batch_size work in Omni online serving by @ZeldaHuang in #487
- [NPU][Model] Use cos and sin cache for Z-Image to support NPU by @gcanlin in #485
- fix amd ci by @ZJY0516 in #490
- [Model] Support Qwen-Image series models on NPU by @muziyuhui666 in #463
- add openai create speech endpoint by @Bhanu068 in #305
- [Model] Support stable diffusion3 by @iwzbi in #439
- model forward call logic fixes by @divyanshsinghvi in #495
- [Bugfix][NPU] Add _model_forward for ModelRunner by @gcanlin in #505
- [Core] remove deparated code from PR391 by @yinpeiqi in #502
- [Docs:] Update Environment requirements for developer guide by @lishunyang12 in #522
- [Refactor] let the AsyncOmni class inherit from Omni class by @yinpeiqi in #511
- [Doc] Adding diffusion model by @Bounty-hunter in #524
- [Debug] Debug qwen3 mix modality output empty string by @tzhouam in #431
- [Bugfix] t2i online fix cache configuration failure and errors in
/v1/images/generationendpoint. by @david6666666 in #499 - Support Qwen3 Omni thinker cuda graph by @ZeldaHuang in #523
- update docs by @hsliuustc0106 in #536
- [Feature] Control the stage init timeout threshold by --stage-init-timeout by @tzhouam in #393
- [Bugfix]fix proces close problem for omni class by @Gaohan123 in #554
- [BugFix] del exit don't join the spawned process by @yinpeiqi in #555
- [Feature] Add tqdm in Omni pipeline to align with Vllm by @lishunyang12 in #552
- [Doc] fix quickstart by @david6666666 in #548
- update qwen-omni docs by @hsliuustc0106 in #559
- [docs] update adding models docs by @hsliuustc0106 in #563
- [New Model]Bagel model(Diffusion Only) by @princepride in #319
- [Misc] Add Qwen-Image-2512 by @david6666666 in #547
- [Doc] clarify that batch image generation is currently unsupported by @fhfuih in #564
- [Feature] Support Omni serving abort request by @ZeldaHuang in #486
- [NPU] Support mixed modalities for Qwen3-Omni by @gcanlin in #537
- [BugFix] Fix video frame extraction in text-to-video and image-to-video examples by @faaany in #565
- [Doc] complete the clarification on batch image generation by @fhfuih in #567
- [Feature]Basic version of supporting streaming output by @Gaohan123 in #367
- [Diffusion]: Diffusion Ring Attention support by @mxuax in #273
- [Metric] Fix the computation of e2e_total_tokens by @gcanlin in #519
- [Bugfix]Fix Qwen 2.5 omni hardcoded max_mel_frames to resolve shape mismatch by @sniper35 in #543
- [Core] omni refactor: optimize stage initialization polling and troubleshooting logs by @hsliuustc0106 in #575
- [Misc] Fix Qwen-Omni multiple prompt running scripts by @SamitHuang in #579
- [Model] Replace diffusers apply_rotary_emb with omni RotaryEmbedding by @iwzbi in #496
- [Feat] Enable cache-dit for stable diffusion3.5 by @iwzbi in #584
- [BugFix] Fix repeated warnings from get_current_omni_diffusion_config outside init phase by @MineQihang in #593
- [Bugfix] Fix on_finalize_request method arguments number mismatch by @iwzbi in #583
- [Bugfix] Removed the NPU-specific code path in _run_local_attention by @gcanlin in #597
- [Docs] Fix diffusion acceleration docs by @SamitHuang in #591
- [Doc] Updated user guide for omni examples to avoid any inconvenience by @lishunyang12 in #581
- [Misc] fix ring att log by @david6666666 in #610
- [Doc] add image-to-video example readme by @faaany in #568
- [Doc] fix i2i online serving doc by @david6666666 in #608
- [Bugfix] Fix the collect_env.py link by @Jeffwan in #604
- [Bugfix] Support both list and generator mode & fix unnecessary error log by @Gaohan123 in #599
- [Doc] fix quickstart by @david6666666 in #620
- [Docs] Add docs for AutoRegressive module design by @Gaohan123 in #589
- [Bagel]Add image edit by @princepride in #588
- Dev/add i2i bash by @tzhouam in #623
- [Docs] Add diffusion module design doc by @SamitHuang in #592
- [BugFix] AttributeError: 'AsyncOmniDiffusion' object has no attribute 'abort' by @ZeldaHuang in #624
- bugfix: fix longcat-image cache dispatch by @DefTruth in #638
- [Feature] Support torch profiler across omni stages by @gcanlin in #553
- [BugFix]Fix text_to_image bagel command error by @princepride in #644
New Contributors
- @wjcwjc77 made their first contribution in #166
- @princepride made their first contribution in #177
- @knlnguyen1802 made their first contribution in #185
- @linyueqian made their first contribution in #172
- @Isotr0py made their first contribution in #200
- @fake0fan made their first contribution in #193
- @yinpeiqi made their first contribution in #207
- @dougbtv made their first contribution in #245
- @david6666666 made their first contribution in #249
- @natureofnature made their first contribution in #215
- @tjtanaa made their first contribution in #144
- @RuixiangMa made their first contribution in #275
- @LawJarp-A made their first contribution in #179
- @Bounty-hunter made their first contribution in #307
- @wtomin made their first contribution in #189
- @yuanheng-zhao made their first contribution in #304
- @e1ijah1 made their first contribution in #291
- @divyanshsinghvi made their first contribution in #263
- @fhfuih made their first contribution in #430
- @wonjerry made their first contribution in #411
- @iwzbi made their first contribution in #461
- @ZeldaHuang made their first contribution in #438
- @wuhang2014 made their first contribution in #467
- @LJH-LBJ made their first contribution in #468
- @tukwila made their first contribution in #429
- @muziyuhui666 made their first contribution in #463
- @Bhanu068 made their first contribution in #305
- @lishunyang12 made their first contribution in #522
- @faaany made their first contribution in #565
- @mxuax made their first contribution in #273
- @sniper35 made their first contribution in #543
- @MineQihang made their first contribution in #593
- @Jeffwan made their first contribution in #604
- @DefTruth made their first contribution in #638
Full Changelog: v0.11.0rc1...v0.12.0rc1