[Diffusion] Add SANA-WM with streaming support by AgainstEntropy · Pull Request #27531 · sgl-project/sglang

AgainstEntropy · 2026-06-08T05:50:50Z

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

CI States

Latest PR Test (Base): ❌ Run #27150943637
Latest PR Test (Extra): ❌ Run #27150941127

NVIDIA SANA-WM (camera-controlled TI2V world model, NVlabs/Sana), runnable end-to-end in the BIDIRECTIONAL (dense) mode: - Model: SanaWMTransformer3DModel (20 GDN/softmax hybrid blocks, UCPE camera conditioning, packed Plucker raymaps, chunk-causal forward_long with a per-block 10-slot streaming KV cache), the frame-wise GDN Triton kernels, the LTX-2 refiner DiT model class, and the model/refiner configs. - Pipeline: SanaWMPipeline / SanaWMTwoStagePipeline dense path (one-shot bidirectional denoise -> dense whole-clip LTX-2 refiner -> causal-VAE decode), the WASD/IJKL camera-control action DSL + Plucker raymap conditioning, the pipeline/sampling configs, and the registry + model-overlay wiring. - The env-gated parity-probe harness and CPU unit tests (forward_long scans and cache contract, cached-vs-dense reduction, dense pipeline/sampling config). Runnable: dense bidirectional TI2V generation. Stacked PR 1/4 of the SANA-WM enablement (bidirectional -> batch streaming -> realtime -> rest). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: sjmshsh <88866917+sjmshsh@users.noreply.github.com>

Layers BATCH STREAMING onto the dense pipeline from PR 1/4: - SanaWMStreamingDenoisingStage: autoregressive self-forcing streaming denoise (subclasses CausalDMDDenoisingStage; runs forward_long chunk-by-chunk over a shared per-chunk core -- the LingBot-family shape) plus the self-forcing sampler utilities (chunk grid / KV window / explicit distilled sigma list -- not a diffusers scheduler, so it lives next to its consumer). - SanaWMStreamingRefinerStage: chunked sink/current LTX-2 refiner; and the streaming causal-VAE decode (decode_chunk + per-conv cache). - Wires the streaming dispatch into SanaWMTwoStagePipeline: streaming=True selects the streaming denoise / refiner / decode stages. Runnable: batch streaming -- low-latency causal generation over a full action sequence. Stacked PR 2/4 of the SANA-WM enablement. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: sjmshsh <88866917+sjmshsh@users.noreply.github.com>

…mework Live WASD/IJKL world-model serving over /v1/realtime_video as a LingBot-shaped stage CHAIN (each stage owning a small BaseRealtimeState blob): cond-frame encode -> latent-prep (chunk plan + seeded noise: front-loaded within a fixed horizon, uniform past it, multi-chunk plans only when the refiner grid lags) -> camera conditioning -> the denoise stage (this PR adds its per-chunk session path + SanaWMStreamCacheState onto the PR 2/4 class)'s session path -> chunked LTX-2 refiner (complete blocks only; sessions stream seamlessly past any horizon) -> causal-VAE decode following the refined frontier. Plus the SanaWMRealtimeAdapter (segment-aware camera action sampling, transport-field forwarding, open-ended sessions when num_frames is omitted) the session-path unit tests, and the chain unit tests. Realtime output is BITWISE-identical to the batch streaming pipeline at the stage-1 latent level (and refiner-probe-identical through stage 2) — verified by the env-gated parity harness shipped in PR 1/4. Stacked PR 3/4 of the SANA-WM enablement (model -> batch pipeline -> realtime -> CI). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: mickqian <30898949+mickqian@users.noreply.github.com>

…emote suite SANA_WM_TI2V CI case (reduced-res 384x640), accuracy-harness component skips + transformer hook compat for the custom GDN DiT, the offline batch client (manifest build/run), and the remote verification suite script. Stacked PR 4/4 of the SANA-WM enablement (model -> batch pipeline -> realtime -> CI). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gemini-code-assist · 2026-06-08T05:50:53Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

The SANA-WM GPU CI case pulls the full multi-GB checkpoint and runs inference — too heavy for CI. Remove it, its companions, and the unreferenced standalone verification scripts: - gpu_cases.py: drop the sana_wm_ti2v DiffusionTestCase + its imports + the no-sequence-parallelism 2-GPU note. - testcase_configs.py: drop SANA_WM_TI2V_CI_sampling_params. - accuracy_config.py: drop the sana_wm_ti2v component-skip entry. - accuracy_hooks.py: drop the SANA-WM transformer-hook-compat branch and revert the VAE wan-video-latent condition to `"wan" in model_path` (the added ltx/sana-wm tokens changed pre-existing LTX behavior — out of scope; restored to main's baseline). - test/scripts/: remove run_sana_wm_remote_suite.sh + sana_wm_batch_client.py (standalone tooling, referenced by no CI/workflow/code). Kept DEFAULT_SANA_WM_MODEL_NAME_FOR_TEST (still used by the SANA-WM unit test). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A realtime pipeline establishes per-session state over the WebSocket, so the synthetic server-warmup request (which has no session) fails in the realtime stage ("requires a realtime session") and aborts startup. Detect a registered realtime adapter in _run_server_warmup_after_http_ready and skip server warmup, so `serve --pipeline-class-name SanaWMRealtimePipeline` starts out of the box. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

mickqian · 2026-06-08T15:10:40Z

/tag-and-rerun-ci

sjmshsh · 2026-06-09T03:16:24Z

I noticed that you have already completed the integration and optimization of SANA-WM, but I found that TP is not supported yet. Do you think SANA-WM needs to support a TP strategy? @AgainstEntropy

…t#27531) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: sjmshsh <88866917+sjmshsh@users.noreply.github.com> Co-authored-by: Mick <mickjagger19@icloud.com>

AgainstEntropy · 2026-06-09T06:28:45Z

I noticed that you have already completed the integration and optimization of SANA-WM, but I found that TP is not supported yet. Do you think SANA-WM needs to support a TP strategy? @AgainstEntropy

Hey @sjmshsh , it was a bit urgent to add SANA-WM with streaming & realtime support, so this PR was built on some previous changes in #26153 (co-authored with you). And yes I think we need TP here. Also could you please port some of your later optimizations? Thank you!

sjmshsh · 2026-06-10T02:03:04Z

I noticed that you have already completed the integration and optimization of SANA-WM, but I found that TP is not supported yet. Do you think SANA-WM needs to support a TP strategy? @AgainstEntropy

Hey @sjmshsh , it was a bit urgent to add SANA-WM with streaming & realtime support, so this PR was built on some previous changes in #26153 (co-authored with you). And yes I think we need TP here. Also could you please port some of your later optimizations? Thank you!

OK

sjmshsh · 2026-06-27T14:40:25Z

I noticed that you have already completed the integration and optimization of SANA-WM, but I found that TP is not supported yet. Do you think SANA-WM needs to support a TP strategy? @AgainstEntropy

Hey @sjmshsh , it was a bit urgent to add SANA-WM with streaming & realtime support, so this PR was built on some previous changes in #26153 (co-authored with you). And yes I think we need TP here. Also could you please port some of your later optimizations? Thank you!

OK

#29513

@mickqian @AgainstEntropy

AgainstEntropy and others added 5 commits June 5, 2026 07:25

fix lint

405466a

AgainstEntropy requested review from BBuf, HaiShaw, mickqian, ping1jing2, yhyang201, yichiche and yingluosanqian as code owners June 8, 2026 05:50

github-actions Bot added diffusion SGLang Diffusion jit-kernel labels Jun 8, 2026

AgainstEntropy mentioned this pull request Jun 8, 2026

[Feat] Support SANA-WM streaming #27331

Closed

5 tasks

AgainstEntropy and others added 11 commits June 8, 2026 06:43

style: black 26.1.0 formatting for ltx_2_vae.py (CI lint)

4d1473f

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add SANA-WM realtime consistency coverage

9cda8d3

Fix SANA-WM realtime requested size propagation

ae9cf42

Fix realtime WebP preview fallback

d2a461d

Split realtime encoded preview messages

b4bfdf2

Fix realtime preview first-frame display

61e0627

Avoid local imports in SANA-WM runtime

b68e232

Remove unused SANA-WM helpers

a878858

Refactor SANA-WM realtime stage foundation

29735e7

github-actions Bot added the run-ci label Jun 8, 2026

Clean up SANA-WM realtime imports

5989645

mickqian added 4 commits June 8, 2026 23:25

Merge origin/main into SANA-WM PR

f7eeadc

Clean up SANA-WM utility staging

8609836

Fix SANA-WM realtime adapter imports

62aef4e

Apply pre-commit formatting

8db8105

mickqian merged commit 32bedbf into sgl-project:main Jun 8, 2026
128 of 161 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Diffusion] Add SANA-WM with streaming support#27531

[Diffusion] Add SANA-WM with streaming support#27531
mickqian merged 21 commits into
sgl-project:mainfrom
AgainstEntropy:faet/sana-wm

AgainstEntropy commented Jun 8, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot commented Jun 8, 2026

Uh oh!

mickqian commented Jun 8, 2026

Uh oh!

Uh oh!

sjmshsh commented Jun 9, 2026

Uh oh!

AgainstEntropy commented Jun 9, 2026

Uh oh!

sjmshsh commented Jun 10, 2026

Uh oh!

sjmshsh commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

AgainstEntropy commented Jun 8, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

CI States

Uh oh!

gemini-code-assist Bot commented Jun 8, 2026

Uh oh!

mickqian commented Jun 8, 2026

Uh oh!

Uh oh!

sjmshsh commented Jun 9, 2026

Uh oh!

AgainstEntropy commented Jun 9, 2026

Uh oh!

sjmshsh commented Jun 10, 2026

Uh oh!

sjmshsh commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AgainstEntropy commented Jun 8, 2026 •

edited by github-actions Bot

Loading