Checklist
Motivation
Earlier this year, LLaDA released the first diffusion LLM (dLLM), immediately capturing significant attention from both the academic and industrial communities. But there were no production-ready dLLM serving engine.
We plan to implement the most performant, production-ready dLLM framework in SGLang, make dLLM robust !
Features
For RL
VL-dLLM
More supported models
More Hardwares
More Parallelism
Kernel Optimization for dLLM
More disaggregation
PD is not suitable for dLLM, but AFD might be a viable option.
More Tests
Better streaming output
RFC
#12766
Related resources
Checklist
Motivation
Earlier this year, LLaDA released the first diffusion LLM (dLLM), immediately capturing significant attention from both the academic and industrial communities. But there were no production-ready dLLM serving engine.
We plan to implement the most performant, production-ready dLLM framework in SGLang, make dLLM robust !
Features
[DLLM] Add documentation for diffusion LLMs #14358
[DLLM] Add CI for diffusion LLMs #14723
For RL
VL-dLLM
More supported models
More Hardwares
More Parallelism
Kernel Optimization for dLLM
The dLLM makes the fused‑MoE kernel a bottleneck, so we need to optimize the fused implementation for dLLM scenarios.
Optimize block‑wise causal attention for dLLM prefill.
More disaggregation
PD is not suitable for dLLM, but AFD might be a viable option.
More Tests
Better streaming output
RFC
#12766
Related resources