[Diffusion] Add SANA-WM with streaming support#27531
Conversation
NVIDIA SANA-WM (camera-controlled TI2V world model, NVlabs/Sana), runnable end-to-end in the BIDIRECTIONAL (dense) mode: - Model: SanaWMTransformer3DModel (20 GDN/softmax hybrid blocks, UCPE camera conditioning, packed Plucker raymaps, chunk-causal forward_long with a per-block 10-slot streaming KV cache), the frame-wise GDN Triton kernels, the LTX-2 refiner DiT model class, and the model/refiner configs. - Pipeline: SanaWMPipeline / SanaWMTwoStagePipeline dense path (one-shot bidirectional denoise -> dense whole-clip LTX-2 refiner -> causal-VAE decode), the WASD/IJKL camera-control action DSL + Plucker raymap conditioning, the pipeline/sampling configs, and the registry + model-overlay wiring. - The env-gated parity-probe harness and CPU unit tests (forward_long scans and cache contract, cached-vs-dense reduction, dense pipeline/sampling config). Runnable: dense bidirectional TI2V generation. Stacked PR 1/4 of the SANA-WM enablement (bidirectional -> batch streaming -> realtime -> rest). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: sjmshsh <88866917+sjmshsh@users.noreply.github.com>
Layers BATCH STREAMING onto the dense pipeline from PR 1/4: - SanaWMStreamingDenoisingStage: autoregressive self-forcing streaming denoise (subclasses CausalDMDDenoisingStage; runs forward_long chunk-by-chunk over a shared per-chunk core -- the LingBot-family shape) plus the self-forcing sampler utilities (chunk grid / KV window / explicit distilled sigma list -- not a diffusers scheduler, so it lives next to its consumer). - SanaWMStreamingRefinerStage: chunked sink/current LTX-2 refiner; and the streaming causal-VAE decode (decode_chunk + per-conv cache). - Wires the streaming dispatch into SanaWMTwoStagePipeline: streaming=True selects the streaming denoise / refiner / decode stages. Runnable: batch streaming -- low-latency causal generation over a full action sequence. Stacked PR 2/4 of the SANA-WM enablement. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: sjmshsh <88866917+sjmshsh@users.noreply.github.com>
…mework Live WASD/IJKL world-model serving over /v1/realtime_video as a LingBot-shaped stage CHAIN (each stage owning a small BaseRealtimeState blob): cond-frame encode -> latent-prep (chunk plan + seeded noise: front-loaded within a fixed horizon, uniform past it, multi-chunk plans only when the refiner grid lags) -> camera conditioning -> the denoise stage (this PR adds its per-chunk session path + SanaWMStreamCacheState onto the PR 2/4 class)'s session path -> chunked LTX-2 refiner (complete blocks only; sessions stream seamlessly past any horizon) -> causal-VAE decode following the refined frontier. Plus the SanaWMRealtimeAdapter (segment-aware camera action sampling, transport-field forwarding, open-ended sessions when num_frames is omitted) the session-path unit tests, and the chain unit tests. Realtime output is BITWISE-identical to the batch streaming pipeline at the stage-1 latent level (and refiner-probe-identical through stage 2) — verified by the env-gated parity harness shipped in PR 1/4. Stacked PR 3/4 of the SANA-WM enablement (model -> batch pipeline -> realtime -> CI). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: mickqian <30898949+mickqian@users.noreply.github.com>
…emote suite SANA_WM_TI2V CI case (reduced-res 384x640), accuracy-harness component skips + transformer hook compat for the custom GDN DiT, the offline batch client (manifest build/run), and the remote verification suite script. Stacked PR 4/4 of the SANA-WM enablement (model -> batch pipeline -> realtime -> CI). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
The SANA-WM GPU CI case pulls the full multi-GB checkpoint and runs inference — too heavy for CI. Remove it, its companions, and the unreferenced standalone verification scripts: - gpu_cases.py: drop the sana_wm_ti2v DiffusionTestCase + its imports + the no-sequence-parallelism 2-GPU note. - testcase_configs.py: drop SANA_WM_TI2V_CI_sampling_params. - accuracy_config.py: drop the sana_wm_ti2v component-skip entry. - accuracy_hooks.py: drop the SANA-WM transformer-hook-compat branch and revert the VAE wan-video-latent condition to `"wan" in model_path` (the added ltx/sana-wm tokens changed pre-existing LTX behavior — out of scope; restored to main's baseline). - test/scripts/: remove run_sana_wm_remote_suite.sh + sana_wm_batch_client.py (standalone tooling, referenced by no CI/workflow/code). Kept DEFAULT_SANA_WM_MODEL_NAME_FOR_TEST (still used by the SANA-WM unit test). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A realtime pipeline establishes per-session state over the WebSocket, so the
synthetic server-warmup request (which has no session) fails in the realtime
stage ("requires a realtime session") and aborts startup. Detect a registered
realtime adapter in _run_server_warmup_after_http_ready and skip server warmup,
so `serve --pipeline-class-name SanaWMRealtimePipeline` starts out of the box.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
/tag-and-rerun-ci |
|
I noticed that you have already completed the integration and optimization of SANA-WM, but I found that TP is not supported yet. Do you think SANA-WM needs to support a TP strategy? @AgainstEntropy |
…t#27531) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: sjmshsh <88866917+sjmshsh@users.noreply.github.com> Co-authored-by: Mick <mickjagger19@icloud.com>
Hey @sjmshsh , it was a bit urgent to add SANA-WM with streaming & realtime support, so this PR was built on some previous changes in #26153 (co-authored with you). And yes I think we need TP here. Also could you please port some of your later optimizations? Thank you! |
OK |
|
Motivation
Modifications
Accuracy Tests
Speed Tests and Profiling
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ciCI States
Latest PR Test (Base): ❌ Run #27150943637
Latest PR Test (Extra): ❌ Run #27150941127