Skip to content

[Diffusion] Add SANA-WM with streaming support#27531

Merged
mickqian merged 21 commits into
sgl-project:mainfrom
AgainstEntropy:faet/sana-wm
Jun 8, 2026
Merged

[Diffusion] Add SANA-WM with streaming support#27531
mickqian merged 21 commits into
sgl-project:mainfrom
AgainstEntropy:faet/sana-wm

Conversation

@AgainstEntropy

@AgainstEntropy AgainstEntropy commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

CI States

Latest PR Test (Base): ❌ Run #27150943637
Latest PR Test (Extra): ❌ Run #27150941127

AgainstEntropy and others added 5 commits June 5, 2026 07:25
NVIDIA SANA-WM (camera-controlled TI2V world model, NVlabs/Sana), runnable
end-to-end in the BIDIRECTIONAL (dense) mode:

- Model: SanaWMTransformer3DModel (20 GDN/softmax hybrid blocks, UCPE camera
  conditioning, packed Plucker raymaps, chunk-causal forward_long with a
  per-block 10-slot streaming KV cache), the frame-wise GDN Triton kernels, the
  LTX-2 refiner DiT model class, and the model/refiner configs.
- Pipeline: SanaWMPipeline / SanaWMTwoStagePipeline dense path (one-shot
  bidirectional denoise -> dense whole-clip LTX-2 refiner -> causal-VAE decode),
  the WASD/IJKL camera-control action DSL + Plucker raymap conditioning, the
  pipeline/sampling configs, and the registry + model-overlay wiring.
- The env-gated parity-probe harness and CPU unit tests (forward_long scans and
  cache contract, cached-vs-dense reduction, dense pipeline/sampling config).

Runnable: dense bidirectional TI2V generation. Stacked PR 1/4 of the SANA-WM
enablement (bidirectional -> batch streaming -> realtime -> rest).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: sjmshsh <88866917+sjmshsh@users.noreply.github.com>
Layers BATCH STREAMING onto the dense pipeline from PR 1/4:

- SanaWMStreamingDenoisingStage: autoregressive self-forcing streaming denoise
  (subclasses CausalDMDDenoisingStage; runs forward_long chunk-by-chunk over a
  shared per-chunk core -- the LingBot-family shape) plus the self-forcing
  sampler utilities (chunk grid / KV window / explicit distilled sigma list --
  not a diffusers scheduler, so it lives next to its consumer).
- SanaWMStreamingRefinerStage: chunked sink/current LTX-2 refiner; and the
  streaming causal-VAE decode (decode_chunk + per-conv cache).
- Wires the streaming dispatch into SanaWMTwoStagePipeline: streaming=True
  selects the streaming denoise / refiner / decode stages.

Runnable: batch streaming -- low-latency causal generation over a full action
sequence. Stacked PR 2/4 of the SANA-WM enablement.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: sjmshsh <88866917+sjmshsh@users.noreply.github.com>
…mework

Live WASD/IJKL world-model serving over /v1/realtime_video as a LingBot-shaped
stage CHAIN (each stage owning a small BaseRealtimeState blob): cond-frame
encode -> latent-prep (chunk plan + seeded noise: front-loaded within a fixed
horizon, uniform past it, multi-chunk plans only when the refiner grid lags) ->
camera conditioning -> the denoise stage (this PR adds its per-chunk session path + SanaWMStreamCacheState onto the PR 2/4 class)'s session path -> chunked LTX-2
refiner (complete blocks only; sessions stream seamlessly past any horizon) ->
causal-VAE decode following the refined frontier. Plus the SanaWMRealtimeAdapter
(segment-aware camera action sampling, transport-field forwarding, open-ended
sessions when num_frames is omitted) the session-path unit tests, and the chain unit tests.

Realtime output is BITWISE-identical to the batch streaming pipeline at the
stage-1 latent level (and refiner-probe-identical through stage 2) — verified
by the env-gated parity harness shipped in PR 1/4.

Stacked PR 3/4 of the SANA-WM enablement (model -> batch pipeline -> realtime
-> CI).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: mickqian <30898949+mickqian@users.noreply.github.com>
…emote suite

SANA_WM_TI2V CI case (reduced-res 384x640), accuracy-harness component skips +
transformer hook compat for the custom GDN DiT, the offline batch client
(manifest build/run), and the remote verification suite script.

Stacked PR 4/4 of the SANA-WM enablement (model -> batch pipeline -> realtime
-> CI).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

AgainstEntropy and others added 11 commits June 8, 2026 06:43
The SANA-WM GPU CI case pulls the full multi-GB checkpoint and runs inference —
too heavy for CI. Remove it, its companions, and the unreferenced standalone
verification scripts:

- gpu_cases.py: drop the sana_wm_ti2v DiffusionTestCase + its imports + the
  no-sequence-parallelism 2-GPU note.
- testcase_configs.py: drop SANA_WM_TI2V_CI_sampling_params.
- accuracy_config.py: drop the sana_wm_ti2v component-skip entry.
- accuracy_hooks.py: drop the SANA-WM transformer-hook-compat branch and revert
  the VAE wan-video-latent condition to `"wan" in model_path` (the added
  ltx/sana-wm tokens changed pre-existing LTX behavior — out of scope; restored
  to main's baseline).
- test/scripts/: remove run_sana_wm_remote_suite.sh + sana_wm_batch_client.py
  (standalone tooling, referenced by no CI/workflow/code).

Kept DEFAULT_SANA_WM_MODEL_NAME_FOR_TEST (still used by the SANA-WM unit test).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A realtime pipeline establishes per-session state over the WebSocket, so the
synthetic server-warmup request (which has no session) fails in the realtime
stage ("requires a realtime session") and aborts startup. Detect a registered
realtime adapter in _run_server_warmup_after_http_ready and skip server warmup,
so `serve --pipeline-class-name SanaWMRealtimePipeline` starts out of the box.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mickqian

mickqian commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@github-actions github-actions Bot added the run-ci label Jun 8, 2026
@mickqian mickqian merged commit 32bedbf into sgl-project:main Jun 8, 2026
128 of 161 checks passed
@sjmshsh

sjmshsh commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

I noticed that you have already completed the integration and optimization of SANA-WM, but I found that TP is not supported yet. Do you think SANA-WM needs to support a TP strategy? @AgainstEntropy

jeynmann pushed a commit to jeynmann/sglang that referenced this pull request Jun 9, 2026
…t#27531)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: sjmshsh <88866917+sjmshsh@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
@AgainstEntropy

Copy link
Copy Markdown
Collaborator Author

I noticed that you have already completed the integration and optimization of SANA-WM, but I found that TP is not supported yet. Do you think SANA-WM needs to support a TP strategy? @AgainstEntropy

Hey @sjmshsh , it was a bit urgent to add SANA-WM with streaming & realtime support, so this PR was built on some previous changes in #26153 (co-authored with you). And yes I think we need TP here. Also could you please port some of your later optimizations? Thank you!

@sjmshsh

sjmshsh commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

I noticed that you have already completed the integration and optimization of SANA-WM, but I found that TP is not supported yet. Do you think SANA-WM needs to support a TP strategy? @AgainstEntropy

Hey @sjmshsh , it was a bit urgent to add SANA-WM with streaming & realtime support, so this PR was built on some previous changes in #26153 (co-authored with you). And yes I think we need TP here. Also could you please port some of your later optimizations? Thank you!

OK

@sjmshsh

sjmshsh commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

I noticed that you have already completed the integration and optimization of SANA-WM, but I found that TP is not supported yet. Do you think SANA-WM needs to support a TP strategy? @AgainstEntropy

Hey @sjmshsh , it was a bit urgent to add SANA-WM with streaming & realtime support, so this PR was built on some previous changes in #26153 (co-authored with you). And yes I think we need TP here. Also could you please port some of your later optimizations? Thank you!

OK

#29513

@mickqian @AgainstEntropy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants