SGLang Diffusion 外部影响力调研：kernel、feature 与平台采用情况

## 背景

这份记录整理公开 GitHub 信息中，diffusion 相关项目对 SGLang Diffusion / `sgl-kernel` / SGLang runtime feature 的使用、改编和生态接入情况。

筛选规则：

- 只保留当前 GitHub stars >= 10 的仓库。
- stars < 10 的仓库不进入主表。
- 区分直接 kernel 调用、kernel 源码改编、runtime feature 迁移、平台后端接入、官方模型文档/生态认可。
- stars 数和 PR 状态按 2026-04-20 公开 GitHub 信息核对。

已剔除的低 star 仓库包括：`aws-samples/sample-qwen-on-aws`、`Happy-Boat/sglang-diffusion-latent-parallel`、`cloud-zby/Appro-SGLang`、`faradawn/sglang-diffusion-frontend`、`endman100/ComfyUI_SGLDiffusion_Fix`、`chpe0312/sglang-diffusion`、`eliotwang/sglang_diffusion` 等。

## 总表

| 项目 | Stars | 证据类型 | SGLang Diffusion 影响/使用点 | 代码级证据 | PR/状态 | 影响力判断 |
|---|---:|---|---|---|---|---|
| `QwenLM/Qwen-Image` | 7792 | 官方模型文档 | 官方 README 给 SGLang generate 示例，并提到 SGLang-Diffusion day-0 support / 生态支持 | [README](https://github.com/QwenLM/Qwen-Image/blob/main/README.md) | 文档型 | 中，模型侧认可 |
| `ai-dynamo/dynamo` | 6596 | serving backend | SGLang backend 支持 image diffusion/video generation worker，集成 SGLang `DiffGenerator`，暴露 `/v1/images/generations`、`/v1/videos` | [sglang-diffusion.md](https://github.com/ai-dynamo/dynamo/blob/main/docs/backends/sglang/sglang-diffusion.md), [image_diffusion.sh](https://github.com/ai-dynamo/dynamo/blob/main/examples/backends/sglang/launch/image_diffusion.sh), [text-to-video-diffusion.sh](https://github.com/ai-dynamo/dynamo/blob/main/examples/backends/sglang/launch/text-to-video-diffusion.sh) | merged: [#5609](https://github.com/ai-dynamo/dynamo/pull/5609), [#5793](https://github.com/ai-dynamo/dynamo/pull/5793), [#7870](https://github.com/ai-dynamo/dynamo/pull/7870), [#8035](https://github.com/ai-dynamo/dynamo/pull/8035)；open: [#8332](https://github.com/ai-dynamo/dynamo/pull/8332) | 强，平台后端级采用 |
| `NVlabs/Sana` | 5092 | 官方模型文档 | 官方 docs 说明 Sana 系列 natively supported in SGLang，包含 generate、`DiffGenerator`、serve、offload、LoRA | [docs/sglang.md](https://github.com/NVlabs/Sana/blob/main/docs/sglang.md) | 文档型 | 中，模型侧认可 |
| `gpustack/gpustack` | 4864 | 平台部署 | SGLang 被作为 image/diffusion backend；代码里有 diffusion version guard、`sglang serve` command builder、attention backend fallback | [sglang.py](https://github.com/gpustack/gpustack/blob/main/gpustack/worker/backends/sglang.py), [model_meta.py](https://github.com/gpustack/gpustack/blob/main/gpustack/worker/model_meta.py), [backend docs](https://github.com/gpustack/gpustack/blob/main/docs/user-guide/built-in-inference-backends.md) | merged: [#3268](https://github.com/gpustack/gpustack/pull/3268), [#3527](https://github.com/gpustack/gpustack/pull/3527), [#3562](https://github.com/gpustack/gpustack/pull/3562), [#3976](https://github.com/gpustack/gpustack/pull/3976)；open: [#4757](https://github.com/gpustack/gpustack/pull/4757) | 强，进入模型平台调度/部署层 |
| `vllm-project/vllm-omni` | 4417 | 功能迁移 | 当前 main 未查到直接 `sgl_kernel` 调用，但 `LayerwiseOffloadHook` 明确基于 SGLang v0.5.8；实现 pinned CPU flat weight、异步 H2D prefetch、block placeholder、Cache-DiT skip 兼容；`data.py`/`request.py` 标注 adapted from sglang/fastvideo | [layerwise_backend.py](https://github.com/vllm-project/vllm-omni/blob/main/vllm_omni/diffusion/offloader/layerwise_backend.py), [data.py](https://github.com/vllm-project/vllm-omni/blob/main/vllm_omni/diffusion/data.py), [request.py](https://github.com/vllm-project/vllm-omni/blob/main/vllm_omni/diffusion/request.py) | merged: [#858](https://github.com/vllm-project/vllm-omni/pull/858), [#1223](https://github.com/vllm-project/vllm-omni/pull/1223), [#1486](https://github.com/vllm-project/vllm-omni/pull/1486), [#2018](https://github.com/vllm-project/vllm-omni/pull/2018), [#2339](https://github.com/vllm-project/vllm-omni/pull/2339)；open: [#2734](https://github.com/vllm-project/vllm-omni/pull/2734), [#2909](https://github.com/vllm-project/vllm-omni/pull/2909), [#2533](https://github.com/vllm-project/vllm-omni/pull/2533), [#2427](https://github.com/vllm-project/vllm-omni/pull/2427), [#2724](https://github.com/vllm-project/vllm-omni/pull/2724) | 强，runtime feature 迁移明显 |
| `hao-ai-lab/FastVideo` | 3405 | kernel infra/API benchmark | `fastvideo-kernel` README 写明 package/build structure based on `sgl-kernel`；benchmark serving 脚本 adapted from SGLang multimodal benchmark；LoRA linear adapted from SGLang。注意 FastVideo 与 SGLang diffusion 也有反向关系，不能单向归因 | [fastvideo-kernel README](https://github.com/hao-ai-lab/FastVideo/blob/main/fastvideo-kernel/README.md), [bench_serving.py](https://github.com/hao-ai-lab/FastVideo/blob/main/fastvideo/entrypoints/cli/bench_serving.py), [linear.py](https://github.com/hao-ai-lab/FastVideo/blob/main/fastvideo/layers/lora/linear.py) | merged: [#1109](https://github.com/hao-ai-lab/FastVideo/pull/1109), [#916](https://github.com/hao-ai-lab/FastVideo/pull/916), [#966](https://github.com/hao-ai-lab/FastVideo/pull/966) | 强，双向生态融合 |
| `NVIDIA/Model-Optimizer` | 2516 | 量化/导出适配 | README/CHANGELOG 将 SGLang 列为 FP8/NVFP4 diffusion checkpoint 部署目标；导出逻辑里有 diffusion QKV fuse 支持 | [README](https://github.com/NVIDIA/Model-Optimizer/blob/main/README.md), [CHANGELOG](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst), [unified_export_hf.py](https://github.com/NVIDIA/Model-Optimizer/blob/main/modelopt/torch/export/unified_export_hf.py), [diffusers_utils.py](https://github.com/NVIDIA/Model-Optimizer/blob/main/modelopt/torch/export/diffusers_utils.py) | 上游文档/代码支持 | 强，影响 SGLang diffusion 量化部署链路 |
| `ModelTC/LightX2V` | 2193 | 直接 kernel 调用 | 直接依赖 `sgl-kernel`，调用 `fp8_scaled_mm`、`int8_scaled_mm`、`rmsnorm`；还从 `sglang.srt.layers.quantization.int8_kernel` import 动态 int8 quant；ROCm 路径做了 SGL kernel-compatible shim | [requirements.txt](https://github.com/ModelTC/LightX2V/blob/main/requirements.txt), [mm_weight.py](https://github.com/ModelTC/LightX2V/blob/main/lightx2v/common/ops/mm/mm_weight.py), [rms_norm_weight.py](https://github.com/ModelTC/LightX2V/blob/main/lightx2v/common/ops/norm/rms_norm_weight.py), [amd_rocm.py](https://github.com/ModelTC/LightX2V/blob/main/lightx2v_platform/base/amd_rocm.py) | merged: [#890](https://github.com/ModelTC/LightX2V/pull/890), [#847](https://github.com/ModelTC/LightX2V/pull/847), [#842](https://github.com/ModelTC/LightX2V/pull/842), [#829](https://github.com/ModelTC/LightX2V/pull/829), [#661](https://github.com/ModelTC/LightX2V/pull/661) | 极强，直接复用 SGLang kernel/package |
| `vipshop/cache-dit` | 1146 | 生态集成 | 文档写明 Cache-DiT 已完整集成到 SGLang Diffusion 和 vLLM-Omni；serving 文档建议走 SGLang/vLLM-Omni | [README](https://github.com/vipshop/cache-dit/blob/main/README.md), [COMMUNITY.md](https://github.com/vipshop/cache-dit/blob/main/docs/COMMUNITY.md), [SERVING.md](https://github.com/vipshop/cache-dit/blob/main/docs/user_guide/SERVING.md) | merged: [#536](https://github.com/vipshop/cache-dit/pull/536), [#764](https://github.com/vipshop/cache-dit/pull/764), [#858](https://github.com/vipshop/cache-dit/pull/858), [#863](https://github.com/vipshop/cache-dit/pull/863), [#933](https://github.com/vipshop/cache-dit/pull/933), [#940](https://github.com/vipshop/cache-dit/pull/940) | 强，体现 SGLang diffusion 的生态入口地位 |
| `intel/llm-scaler` | 264 | XPU 平台/kernel | Dockerfile 直接构建 SGLang Diffusion + `sgl-kernel-xpu`；patch 添加 XPU communicator/platform/attention backend；ComfyUI 节点也接入 SGLang diffusion | [Dockerfile](https://github.com/intel/llm-scaler/blob/main/omni/docker/Dockerfile), [xpu patch](https://github.com/intel/llm-scaler/blob/main/omni/patches/sglang_diffusion_for_multi_arc.patch), [SGLang guide](https://github.com/intel/llm-scaler/blob/main/omni/docs/SGLang_Diffusion_Guide.md), [ComfyUI guide](https://github.com/intel/llm-scaler/blob/main/omni/docs/SGLang_Diffusion_ComfyUI_Guide.md) | merged: [#194](https://github.com/intel/llm-scaler/pull/194), [#237](https://github.com/intel/llm-scaler/pull/237) | 强，扩展到 Intel XPU |
| `ai-dynamo/aiperf` | 229 | benchmark 教程 | 专门教程 benchmark SGLang video generation endpoint，使用 `sglang[diffusion]`、`sglang serve`、`/v1/videos` | [sglang-video-generation.md](https://github.com/ai-dynamo/aiperf/blob/main/docs/tutorials/sglang-video-generation.md) | 文档型 | 中 |
| `Introspective-Diffusion/I-DLM` | 113 | vendor/copy | `inference/sglang` 下带 SGLang diffusion/ComfyUI_SGLDiffusion 代码树，更像 vendored inference 依赖 | [inference/sglang](https://github.com/Introspective-Diffusion/I-DLM/tree/main/inference/sglang), [ComfyUI_SGLDiffusion README](https://github.com/Introspective-Diffusion/I-DLM/blob/main/inference/sglang/sglang/multimodal_gen/apps/ComfyUI_SGLDiffusion/README.md) | 未查到 PR | 中，代码拷贝信号 |
| `KE-AI-ENG/FastDM` | 59 | kernel 代码改编 | FP8 GEMM CUDA 文件明确 adapted from `sgl-kernel/csrc/gemm/fp8_gemm_kernel.cu`，CUTLASS extension 里保留 SGLang 版权头 | [README](https://github.com/KE-AI-ENG/FastDM/blob/main/README.md), [ada_w8a8_fp8.cu](https://github.com/KE-AI-ENG/FastDM/blob/main/csrc/gemm/ada_w8a8_fp8.cu), [hopper_w8a8_fp8.cu](https://github.com/KE-AI-ENG/FastDM/blob/main/csrc/gemm/hopper_w8a8_fp8.cu), [gemm_with_epilogue_visitor.h](https://github.com/KE-AI-ENG/FastDM/blob/main/csrc/include/cutlass_extensions/gemm/gemm_with_epilogue_visitor.h) | 未查到相关 PR | 极强，直接 kernel 源码派生 |
| `zhaochenyang20/sglang-diffusion-routing` | 17 | Router/RL | 专门为 SGLang diffusion workers 做 load-balancing router；支持 `/v1/images/generations`、`/v1/diffusion/generate`、`/v1/videos`、权重更新、release/resume memory | [diffusion_router.py](https://github.com/zhaochenyang20/sglang-diffusion-routing/blob/main/src/sglang_diffusion_routing/router/diffusion_router.py), [launcher/local.py](https://github.com/zhaochenyang20/sglang-diffusion-routing/blob/main/src/sglang_diffusion_routing/launcher/local.py), [e2e test](https://github.com/zhaochenyang20/sglang-diffusion-routing/blob/main/tests/e2e/test_e2e_sglang.py), [README](https://github.com/zhaochenyang20/sglang-diffusion-routing/blob/main/README.md) | merged: [#2](https://github.com/zhaochenyang20/sglang-diffusion-routing/pull/2), [#18](https://github.com/zhaochenyang20/sglang-diffusion-routing/pull/18), [#17](https://github.com/zhaochenyang20/sglang-diffusion-routing/pull/17), [#35](https://github.com/zhaochenyang20/sglang-diffusion-routing/pull/35), [#37](https://github.com/zhaochenyang20/sglang-diffusion-routing/pull/37)；open: [#4](https://github.com/zhaochenyang20/sglang-diffusion-routing/pull/4), [#34](https://github.com/zhaochenyang20/sglang-diffusion-routing/pull/34), [#42](https://github.com/zhaochenyang20/sglang-diffusion-routing/pull/42) | 强，围绕 SGLang diffusion 做上层系统 |

## Open PR 里值得继续跟踪的条目

| PR | 状态 | 重要性 |
|---|---|---|
| `vllm-omni` [#2734](https://github.com/vllm-project/vllm-omni/pull/2734) | open | Bagel layerwise offload，继续扩展 SGLang-derived offload 范式 |
| `vllm-omni` [#2909](https://github.com/vllm-project/vllm-omni/pull/2909) | open | Stable-Audio CPU/layerwise offload |
| `vllm-omni` [#1994](https://github.com/vllm-project/vllm-omni/pull/1994) | open | benchmark/CI 对比其它框架，Dockerfile 里用 `lmsysorg/sglang` 跑 SGLang generate |
| `analytics-zoo/sglang-diffusion` [#1](https://github.com/analytics-zoo/sglang-diffusion/pull/1) | open | Initial XPU support for SGLang diffusion，覆盖 XPU platform、communicator、attention backend、JIT/kernel 兼容 |
| `ai-dynamo/dynamo` [#8332](https://github.com/ai-dynamo/dynamo/pull/8332) | open | SGLang diffusion worker 的 tracing 参数 stub |
| `gpustack/gpustack` [#4757](https://github.com/gpustack/gpustack/pull/4757) | open | 容器执行层修复，会影响 SGLang diffusion image backend 部署 |
| `zhaochenyang20/sglang-diffusion-routing` [#4](https://github.com/zhaochenyang20/sglang-diffusion-routing/pull/4), [#34](https://github.com/zhaochenyang20/sglang-diffusion-routing/pull/34), [#42](https://github.com/zhaochenyang20/sglang-diffusion-routing/pull/42) | open | 路由脚本、T2V routing、health check jitter |

## 结论

SGLang Diffusion 的外部影响可以分成四层：

1. **kernel 直接复用层**：`LightX2V`、`FastDM` 最硬，直接调用或改编 `sgl-kernel` 的 FP8/INT8 GEMM、RMSNorm、timestep embedding、CUTLASS extension。
2. **runtime feature 迁移层**：`vLLM-Omni` 的 layerwise offload 是关键例子，源码明确 based on SGLang v0.5.8。
3. **平台/服务化接入层**：`Dynamo`、`GPUStack`、`Intel llm-scaler`、`sglang-diffusion-routing` 等把 SGLang Diffusion 当成可部署后端或上层系统基座。
4. **模型和生态认可层**：`Qwen-Image`、`Sana`、`Cache-DiT`、`AIPerf` 等把 SGLang Diffusion 写进官方或社区路径，说明它已经不是单仓库功能，而是在 diffusion serving 生态里被当成基座之一。


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SGLang Diffusion 外部影响力调研：kernel、feature 与平台采用情况 #14

背景

总表

Open PR 里值得继续跟踪的条目

结论

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

项目	Stars	证据类型	SGLang Diffusion 影响/使用点	代码级证据	PR/状态	影响力判断
`QwenLM/Qwen-Image`	7792	官方模型文档	官方 README 给 SGLang generate 示例，并提到 SGLang-Diffusion day-0 support / 生态支持	README	文档型	中，模型侧认可
`ai-dynamo/dynamo`	6596	serving backend	SGLang backend 支持 image diffusion/video generation worker，集成 SGLang `DiffGenerator`，暴露 `/v1/images/generations`、`/v1/videos`	sglang-diffusion.md, image_diffusion.sh, text-to-video-diffusion.sh	merged: #5609, #5793, #7870, #8035；open: #8332	强，平台后端级采用
`NVlabs/Sana`	5092	官方模型文档	官方 docs 说明 Sana 系列 natively supported in SGLang，包含 generate、`DiffGenerator`、serve、offload、LoRA	docs/sglang.md	文档型	中，模型侧认可
`gpustack/gpustack`	4864	平台部署	SGLang 被作为 image/diffusion backend；代码里有 diffusion version guard、`sglang serve` command builder、attention backend fallback	sglang.py, model_meta.py, backend docs	merged: #3268, #3527, #3562, #3976；open: #4757	强，进入模型平台调度/部署层
`vllm-project/vllm-omni`	4417	功能迁移	当前 main 未查到直接 `sgl_kernel` 调用，但 `LayerwiseOffloadHook` 明确基于 SGLang v0.5.8；实现 pinned CPU flat weight、异步 H2D prefetch、block placeholder、Cache-DiT skip 兼容；`data.py`/`request.py` 标注 adapted from sglang/fastvideo	layerwise_backend.py, data.py, request.py	merged: #858, #1223, #1486, #2018, #2339；open: #2734, #2909, #2533, #2427, #2724	强，runtime feature 迁移明显
`hao-ai-lab/FastVideo`	3405	kernel infra/API benchmark	`fastvideo-kernel` README 写明 package/build structure based on `sgl-kernel`；benchmark serving 脚本 adapted from SGLang multimodal benchmark；LoRA linear adapted from SGLang。注意 FastVideo 与 SGLang diffusion 也有反向关系，不能单向归因	fastvideo-kernel README, bench_serving.py, linear.py	merged: #1109, #916, #966	强，双向生态融合
`NVIDIA/Model-Optimizer`	2516	量化/导出适配	README/CHANGELOG 将 SGLang 列为 FP8/NVFP4 diffusion checkpoint 部署目标；导出逻辑里有 diffusion QKV fuse 支持	README, CHANGELOG, unified_export_hf.py, diffusers_utils.py	上游文档/代码支持	强，影响 SGLang diffusion 量化部署链路
`ModelTC/LightX2V`	2193	直接 kernel 调用	直接依赖 `sgl-kernel`，调用 `fp8_scaled_mm`、`int8_scaled_mm`、`rmsnorm`；还从 `sglang.srt.layers.quantization.int8_kernel` import 动态 int8 quant；ROCm 路径做了 SGL kernel-compatible shim	requirements.txt, mm_weight.py, rms_norm_weight.py, amd_rocm.py	merged: #890, #847, #842, #829, #661	极强，直接复用 SGLang kernel/package
`vipshop/cache-dit`	1146	生态集成	文档写明 Cache-DiT 已完整集成到 SGLang Diffusion 和 vLLM-Omni；serving 文档建议走 SGLang/vLLM-Omni	README, COMMUNITY.md, SERVING.md	merged: #536, #764, #858, #863, #933, #940	强，体现 SGLang diffusion 的生态入口地位
`intel/llm-scaler`	264	XPU 平台/kernel	Dockerfile 直接构建 SGLang Diffusion + `sgl-kernel-xpu`；patch 添加 XPU communicator/platform/attention backend；ComfyUI 节点也接入 SGLang diffusion	Dockerfile, xpu patch, SGLang guide, ComfyUI guide	merged: #194, #237	强，扩展到 Intel XPU
`ai-dynamo/aiperf`	229	benchmark 教程	专门教程 benchmark SGLang video generation endpoint，使用 `sglang[diffusion]`、`sglang serve`、`/v1/videos`	sglang-video-generation.md	文档型	中
`Introspective-Diffusion/I-DLM`	113	vendor/copy	`inference/sglang` 下带 SGLang diffusion/ComfyUI_SGLDiffusion 代码树，更像 vendored inference 依赖	inference/sglang, ComfyUI_SGLDiffusion README	未查到 PR	中，代码拷贝信号
`KE-AI-ENG/FastDM`	59	kernel 代码改编	FP8 GEMM CUDA 文件明确 adapted from `sgl-kernel/csrc/gemm/fp8_gemm_kernel.cu`，CUTLASS extension 里保留 SGLang 版权头	README, ada_w8a8_fp8.cu, hopper_w8a8_fp8.cu, gemm_with_epilogue_visitor.h	未查到相关 PR	极强，直接 kernel 源码派生
`zhaochenyang20/sglang-diffusion-routing`	17	Router/RL	专门为 SGLang diffusion workers 做 load-balancing router；支持 `/v1/images/generations`、`/v1/diffusion/generate`、`/v1/videos`、权重更新、release/resume memory	diffusion_router.py, launcher/local.py, e2e test, README	merged: #2, #18, #17, #35, #37；open: #4, #34, #42	强，围绕 SGLang diffusion 做上层系统

PR	状态	重要性
`vllm-omni` #2734	open	Bagel layerwise offload，继续扩展 SGLang-derived offload 范式
`vllm-omni` #2909	open	Stable-Audio CPU/layerwise offload
`vllm-omni` #1994	open	benchmark/CI 对比其它框架，Dockerfile 里用 `lmsysorg/sglang` 跑 SGLang generate
`analytics-zoo/sglang-diffusion` #1	open	Initial XPU support for SGLang diffusion，覆盖 XPU platform、communicator、attention backend、JIT/kernel 兼容
`ai-dynamo/dynamo` #8332	open	SGLang diffusion worker 的 tracing 参数 stub
`gpustack/gpustack` #4757	open	容器执行层修复，会影响 SGLang diffusion image backend 部署
`zhaochenyang20/sglang-diffusion-routing` #4, #34, #42	open	路由脚本、T2V routing、health check jitter

SGLang Diffusion 外部影响力调研：kernel、feature 与平台采用情况 #14

Description

背景

总表

Open PR 里值得继续跟踪的条目

结论

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions