[RFC]: Diffusion Models Features Supports Plan

### Motivation.

This RFC lists the status of existing diffusion models support features, as well as some other features in future plan (under discussion).

Some items are overlapped with the tasks in [[RFC]: vLLM-Omni 2026 Q1 Roadmap ](https://github.com/vllm-project/vllm-omni/issues/677), which are more urgent.

A continuous diffusion model acceleration support plan is #1217 . Help wanted! 🙋


### Proposed Change.


**P0:** 🙋

- Single-card acceleration feature

  - graph compilation
     - [x] torch.compile on repeated_blocks :  #684
  - advanced attention
    - [x] sage attention: #243
    - [x] Flash Attention 2 and 3: #783 ; #760
  - quantization:
    - [ ]  ❗important❗quantizing  DiT linear layer: #935 #1058  #1034 @lishunyang12 
  - diffusion distillation
     - [ ] TurboDiffusion: #443 
  - cache acceleration
    - [x] Cache-DiT: #250 , #584 
    - [x] TeaCache Refactor: #833 #817 #848 #940 
  - Support CPU Offloading
    - [x] Standard module-wise offload (text encoder/dit/vae) #497 
    - [x] Layerwise offload (LayerwiseOffloadManager)  @yuanheng-zhao #754 #858 
  - Scheduling & Serving
    - [x] Implement a Batch Scheduler. #797  #427 #701 @fhfuih @asukaqaq-s 
    - [ ] ❗important❗Support ComfyUI web serving integration.  #1113 #900 @fhfuih 
    - [x] Static and Dynamic LoRA support.  #657 #758 
   - avoid cpu sync
    - [x] https://github.com/vllm-project/vllm-omni/pull/942
<br class="Apple-interchange-newline">
</body>
</html>

- Multi-card acceleration feature

  - CFG parallel:
    - [x] QwenImage: #444 
    - [x]  CFG Parallel refactor: allow minimal intrusive edits to exisiting single-card pipeline: #850, #851 
  - Sequence Parallel :
    - [x] QwenImage series models support Ulysses SP and Ring Attention :  #189 ; #273 ;
    - [x] LongCatImage model supports Ulysses SP and Ring Attention: #721;
    - [ ] stable diffusion3.5 supports Ulysses SP and Ring Attention: #654 
    - [x]  #779
  - Patch VAE Parallel:
    - [ ] Patch VAE tiling in distributed ranks  (lossy) : #756 
    - [ ] ❗important❗A unified interface to support Patch VAE Parallelism methods for multiple models
  - Tensor Parallel:       
       - [x] Z-Image #735 
       - [x] Qwen-Image #830
       - [ ] HunyuanImage 3.0: #794 
       - [x] LongCat-Image #926 
       - [ ] Ovis-Image
       - [ ] Stable-Diffusion-3 [@ZANMANGLOOPYE](https://github.com/ZANMANGLOOPYE) 
  - Expert Parallelism (EP) 
       - [ ] [@Semmer2](https://github.com/Semmer2) (waiting for the Hunyuan Image model to be merged first)  
  - Compile and Parallel  https://github.com/vllm-project/vllm-omni/issues/819

- Model E2E performance acceleration: 

    - [ ] Z-Image family
    - [ ] Qwen-Image family @GG-li 
    - [ ] Stable-Diffusion family [@ZANMANGLOOPYE](https://github.com/ZANMANGLOOPYE) 
    - [ ] Flux family
    - [ ] WAN family

- Online serving:
    - [x] i2i /v1/images/edit #510 #1101 
    - [ ] ❗important❗t2v, i2v #793  #1073 


**P1:** 🙋

- Single-card acceleration

  - graph compilation
     - [ ] torch.compile key arguments optimization: e..g, fullgraph=True
  - advanced attention
    - [ ] torch_sdpa advanced attention backends, such as [torch.backends.cuda.enable_mem_efficient_sdp()](https://docs.pytorch.org/docs/stable/backends.html#torch.backends.cuda.enable_mem_efficient_sdp).
    - [ ] Video Sparse Attention: from [FastVideo](https://github.com/hao-ai-lab/FastVideo/blob/main/fastvideo-kernel/python/fastvideo_kernel/ops.py#L69)
    - [ ] SageAttention with quantization support, e.g., `sageattn_qk_int8_pv_fp16_cuda`
    - [ ] SpargeAttn Sparse Attention Backend: #765 
  - diffusion distillation
    - [ ]  [FastVideo/CausalWan2.2-I2V-A14B-Preview-Diffusers](https://huggingface.co/FastVideo/CausalWan2.2-I2V-A14B-Preview-Diffusers)
  - cache acceleration
    - [ ] TeaCache suppports LongCat-Image and  LongCat-Image-Edit
    - [ ] TeaCache suppports Z-Image: #817 
    - [ ] TeaCache suppports Stable-Diffusion3.5
    - [ ] TeaCache suppports Wan2.2
- Multi-card acceleration
  - CFG parallel:
     - [x] LongCatImage, Ovis-Image, Stable-Diffusion3.5, wan 2.2 #851
     - [ ] Z-Image 
  - Sequence Parallel 
     - [ ] Ovis-Image
     - [ ] Z-Image
     - [x] wan 2.2 #966 
  - Patch VAE Parallel
     - [ ] wan 2.2
  - Pipeline Parallelism
     - [ ] PipeFusion RFC: #647 
  - Data Parallelism

### Feedback Period.

_No response_

### CC List.

@hsliuustc0106 @ZJY0516 @SamitHuang @david6666666 @mxuax @lishunyang12 @xiaolin8 @gcanlin @dongbo910220 





### Any Other Things.

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://vllm-omni.readthedocs.io), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC]: Diffusion Models Features Supports Plan #814

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC]: Diffusion Models Features Supports Plan #814

Description

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions