QwenLM/Qwen-Image |
7792 |
官方模型文档 |
官方 README 给 SGLang generate 示例,并提到 SGLang-Diffusion day-0 support / 生态支持 |
README |
文档型 |
中,模型侧认可 |
ai-dynamo/dynamo |
6596 |
serving backend |
SGLang backend 支持 image diffusion/video generation worker,集成 SGLang DiffGenerator,暴露 /v1/images/generations、/v1/videos |
sglang-diffusion.md, image_diffusion.sh, text-to-video-diffusion.sh |
merged: #5609, #5793, #7870, #8035;open: #8332 |
强,平台后端级采用 |
NVlabs/Sana |
5092 |
官方模型文档 |
官方 docs 说明 Sana 系列 natively supported in SGLang,包含 generate、DiffGenerator、serve、offload、LoRA |
docs/sglang.md |
文档型 |
中,模型侧认可 |
gpustack/gpustack |
4864 |
平台部署 |
SGLang 被作为 image/diffusion backend;代码里有 diffusion version guard、sglang serve command builder、attention backend fallback |
sglang.py, model_meta.py, backend docs |
merged: #3268, #3527, #3562, #3976;open: #4757 |
强,进入模型平台调度/部署层 |
vllm-project/vllm-omni |
4417 |
功能迁移 |
当前 main 未查到直接 sgl_kernel 调用,但 LayerwiseOffloadHook 明确基于 SGLang v0.5.8;实现 pinned CPU flat weight、异步 H2D prefetch、block placeholder、Cache-DiT skip 兼容;data.py/request.py 标注 adapted from sglang/fastvideo |
layerwise_backend.py, data.py, request.py |
merged: #858, #1223, #1486, #2018, #2339;open: #2734, #2909, #2533, #2427, #2724 |
强,runtime feature 迁移明显 |
hao-ai-lab/FastVideo |
3405 |
kernel infra/API benchmark |
fastvideo-kernel README 写明 package/build structure based on sgl-kernel;benchmark serving 脚本 adapted from SGLang multimodal benchmark;LoRA linear adapted from SGLang。注意 FastVideo 与 SGLang diffusion 也有反向关系,不能单向归因 |
fastvideo-kernel README, bench_serving.py, linear.py |
merged: #1109, #916, #966 |
强,双向生态融合 |
NVIDIA/Model-Optimizer |
2516 |
量化/导出适配 |
README/CHANGELOG 将 SGLang 列为 FP8/NVFP4 diffusion checkpoint 部署目标;导出逻辑里有 diffusion QKV fuse 支持 |
README, CHANGELOG, unified_export_hf.py, diffusers_utils.py |
上游文档/代码支持 |
强,影响 SGLang diffusion 量化部署链路 |
ModelTC/LightX2V |
2193 |
直接 kernel 调用 |
直接依赖 sgl-kernel,调用 fp8_scaled_mm、int8_scaled_mm、rmsnorm;还从 sglang.srt.layers.quantization.int8_kernel import 动态 int8 quant;ROCm 路径做了 SGL kernel-compatible shim |
requirements.txt, mm_weight.py, rms_norm_weight.py, amd_rocm.py |
merged: #890, #847, #842, #829, #661 |
极强,直接复用 SGLang kernel/package |
vipshop/cache-dit |
1146 |
生态集成 |
文档写明 Cache-DiT 已完整集成到 SGLang Diffusion 和 vLLM-Omni;serving 文档建议走 SGLang/vLLM-Omni |
README, COMMUNITY.md, SERVING.md |
merged: #536, #764, #858, #863, #933, #940 |
强,体现 SGLang diffusion 的生态入口地位 |
intel/llm-scaler |
264 |
XPU 平台/kernel |
Dockerfile 直接构建 SGLang Diffusion + sgl-kernel-xpu;patch 添加 XPU communicator/platform/attention backend;ComfyUI 节点也接入 SGLang diffusion |
Dockerfile, xpu patch, SGLang guide, ComfyUI guide |
merged: #194, #237 |
强,扩展到 Intel XPU |
ai-dynamo/aiperf |
229 |
benchmark 教程 |
专门教程 benchmark SGLang video generation endpoint,使用 sglang[diffusion]、sglang serve、/v1/videos |
sglang-video-generation.md |
文档型 |
中 |
Introspective-Diffusion/I-DLM |
113 |
vendor/copy |
inference/sglang 下带 SGLang diffusion/ComfyUI_SGLDiffusion 代码树,更像 vendored inference 依赖 |
inference/sglang, ComfyUI_SGLDiffusion README |
未查到 PR |
中,代码拷贝信号 |
KE-AI-ENG/FastDM |
59 |
kernel 代码改编 |
FP8 GEMM CUDA 文件明确 adapted from sgl-kernel/csrc/gemm/fp8_gemm_kernel.cu,CUTLASS extension 里保留 SGLang 版权头 |
README, ada_w8a8_fp8.cu, hopper_w8a8_fp8.cu, gemm_with_epilogue_visitor.h |
未查到相关 PR |
极强,直接 kernel 源码派生 |
zhaochenyang20/sglang-diffusion-routing |
17 |
Router/RL |
专门为 SGLang diffusion workers 做 load-balancing router;支持 /v1/images/generations、/v1/diffusion/generate、/v1/videos、权重更新、release/resume memory |
diffusion_router.py, launcher/local.py, e2e test, README |
merged: #2, #18, #17, #35, #37;open: #4, #34, #42 |
强,围绕 SGLang diffusion 做上层系统 |
背景
这份记录整理公开 GitHub 信息中,diffusion 相关项目对 SGLang Diffusion /
sgl-kernel/ SGLang runtime feature 的使用、改编和生态接入情况。筛选规则:
已剔除的低 star 仓库包括:
aws-samples/sample-qwen-on-aws、Happy-Boat/sglang-diffusion-latent-parallel、cloud-zby/Appro-SGLang、faradawn/sglang-diffusion-frontend、endman100/ComfyUI_SGLDiffusion_Fix、chpe0312/sglang-diffusion、eliotwang/sglang_diffusion等。总表
QwenLM/Qwen-Imageai-dynamo/dynamoDiffGenerator,暴露/v1/images/generations、/v1/videosNVlabs/SanaDiffGenerator、serve、offload、LoRAgpustack/gpustacksglang servecommand builder、attention backend fallbackvllm-project/vllm-omnisgl_kernel调用,但LayerwiseOffloadHook明确基于 SGLang v0.5.8;实现 pinned CPU flat weight、异步 H2D prefetch、block placeholder、Cache-DiT skip 兼容;data.py/request.py标注 adapted from sglang/fastvideohao-ai-lab/FastVideofastvideo-kernelREADME 写明 package/build structure based onsgl-kernel;benchmark serving 脚本 adapted from SGLang multimodal benchmark;LoRA linear adapted from SGLang。注意 FastVideo 与 SGLang diffusion 也有反向关系,不能单向归因NVIDIA/Model-OptimizerModelTC/LightX2Vsgl-kernel,调用fp8_scaled_mm、int8_scaled_mm、rmsnorm;还从sglang.srt.layers.quantization.int8_kernelimport 动态 int8 quant;ROCm 路径做了 SGL kernel-compatible shimvipshop/cache-ditintel/llm-scalersgl-kernel-xpu;patch 添加 XPU communicator/platform/attention backend;ComfyUI 节点也接入 SGLang diffusionai-dynamo/aiperfsglang[diffusion]、sglang serve、/v1/videosIntrospective-Diffusion/I-DLMinference/sglang下带 SGLang diffusion/ComfyUI_SGLDiffusion 代码树,更像 vendored inference 依赖KE-AI-ENG/FastDMsgl-kernel/csrc/gemm/fp8_gemm_kernel.cu,CUTLASS extension 里保留 SGLang 版权头zhaochenyang20/sglang-diffusion-routing/v1/images/generations、/v1/diffusion/generate、/v1/videos、权重更新、release/resume memoryOpen PR 里值得继续跟踪的条目
vllm-omni#2734vllm-omni#2909vllm-omni#1994lmsysorg/sglang跑 SGLang generateanalytics-zoo/sglang-diffusion#1ai-dynamo/dynamo#8332gpustack/gpustack#4757zhaochenyang20/sglang-diffusion-routing#4, #34, #42结论
SGLang Diffusion 的外部影响可以分成四层:
LightX2V、FastDM最硬,直接调用或改编sgl-kernel的 FP8/INT8 GEMM、RMSNorm、timestep embedding、CUTLASS extension。vLLM-Omni的 layerwise offload 是关键例子,源码明确 based on SGLang v0.5.8。Dynamo、GPUStack、Intel llm-scaler、sglang-diffusion-routing等把 SGLang Diffusion 当成可部署后端或上层系统基座。Qwen-Image、Sana、Cache-DiT、AIPerf等把 SGLang Diffusion 写进官方或社区路径,说明它已经不是单仓库功能,而是在 diffusion serving 生态里被当成基座之一。