-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Hi, thank you for the great work on Wan2.1!
I’m currently deploying the Wan2.1 T2V-1.3B text-to-video model locally on my Windows machine, but the inference speed is far slower than expected. I would like to ask for help diagnosing the issue.
Environment
OS: Windows 11
GPU: NVIDIA GeForce RTX 4060 Laptop GPU (16GB)
CUDA: cu128
PyTorch: 2.7.0+cu128
FlashAttention: Installed from precompiled wheels:
https://github.com/PLISGOOD/flash-attention-windows-wheels
Verified working via:
python -c "import flash_attn; print('Flash Attention installed successfully!')"Model: Wan2.1-T2V-1.3B
Command used (from README example):
python generate.py --task t2v-1.3B --size 832*480 --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."Problem Description
During local generation:
Each inference step takes ~400 seconds, even though FlashAttention is installed.
A full 50-step video generation takes around 4–5 hours.
GPU memory usage stays around ~11 GB.
This is significantly slower than expected for a 1.3B-parameter model running on an RTX 4060 Laptop GPU.
python generate.py --task t2v-1.3B --size 832*480 --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
[2025-12-04 20:57:33,337] INFO: Generation job args: Namespace(task='t2v-1.3B', size='832*480', frame_num=81, ckpt_dir='./Wan2.1-T2V-1.3B', offload_model=True, ulysses_size=1, ring_size=1, t5_fsdp=False, t5_cpu=True, dit_fsdp=False, save_file=None, src_video=None, src_mask=None, src_ref_images=None, prompt='Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.', use_prompt_extend=False, prompt_extend_method='local_qwen', prompt_extend_model=None, prompt_extend_target_lang='zh', base_seed=1697777428839187584, image=None, first_frame=None, last_frame=None, sample_solver='unipc', sample_steps=50, sample_shift=8.0, sample_guide_scale=6.0)
[2025-12-04 20:57:33,337] INFO: Generation model config: {'__name__': 'Config: Wan T2V 1.3B', 't5_model': 'umt5_xxl', 't5_dtype': torch.bfloat16, 'text_len': 512, 'param_dtype': torch.bfloat16, 'num_train_timesteps': 1000, 'sample_fps': 16, 'sample_neg_prompt': '色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走', 't5_checkpoint': 'models_t5_umt5-xxl-enc-bf16.pth', 't5_tokenizer': 'google/umt5-xxl', 'vae_checkpoint': 'Wan2.1_VAE.pth', 'vae_stride': (4, 8, 8), 'patch_size': (1, 2, 2), 'dim': 1536, 'ffn_dim': 8960, 'freq_dim': 256, 'num_heads': 12, 'num_layers': 30, 'window_size': (-1, -1), 'qk_norm': True, 'cross_attn_norm': True, 'eps': 1e-06}
[2025-12-04 20:57:33,337] INFO: Input prompt: Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.
[2025-12-04 20:57:33,337] INFO: Creating WanT2V pipeline.
[2025-12-04 21:01:51,412] INFO: loading ./Wan2.1-T2V-1.3B\models_t5_umt5-xxl-enc-bf16.pth
[2025-12-04 21:03:44,969] INFO: loading ./Wan2.1-T2V-1.3B\Wan2.1_VAE.pth
[2025-12-04 21:03:46,098] INFO: Creating WanModel from ./Wan2.1-T2V-1.3B
[2025-12-04 21:03:55,388] INFO: Generating video ...
22%|█████████████████████████████▉ | 11/50 [1:10:02<4:35:54, 424.46s/it]Questions
Is this slow generation time expected on a 4060 Laptop GPU, or is something misconfigured?
Does Wan2.1 actually use FlashAttention on Windows builds?
Is the FlashAttention wheel I installed compatible with the attention kernels used in Wan2.1?
Could the Windows environment itself be causing the slowdown?
Additional Notes
I’m not using WSL; everything runs directly on Windows.
FlashAttention loads successfully but I’m unsure whether Wan2.1 is actually utilizing it.
Any guidance would be greatly appreciated. Thank you!