-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Hello there!
I used to generate Wan 2.1 (First Last Frame to Video 720p (FLF2V) 14B) with SageAttention 2 + Compile Transformer Model, on a old laptop with RTX 3060 6GB VRAM. that's was slow but no OOM's, the compiling process would take ages but surely it compiles.
But now, the exact same process (I took the rendering settings from a video rendered from the said laptop) on a brand new desktop, RTX 5060 TI with 16gb VRAM + 32 GB ram, i'm plagued with OOM's and errors. the inference is only possible with "Compile Transformer Model " at off. no more than 7gb VRAM is used at any given time.
I got this OOM's while the VRAM usage stood still:
Lora 'loras\wan_i2v\Wan2.1_I2V_14B_FusionX_LoRA.safetensors' was loaded in model 'models.wan.modules.model'
Unable to pin data of 'loras\wan_i2v\Wan2.1_I2V_14B_FusionX_LoRA.safetensors' to reserved RAM as there is no reserved RAM left. Transfer speed from RAM to VRAM will may be slower.
Traceback (most recent call last):
File "C:\Users\user\Desktop\pourwangp\ComfyUI_windows_portable\Wan2GP-main\wgp.py", line 5624, in generate_video
samples = wan_model.generate(
input_prompt = prompt,
...<80 lines>...
temperature=temperature,
)
File "C:\Users\user\Desktop\pourwangp\ComfyUI_windows_portable\Wan2GP-main\models\wan\any2video.py", line 493, in generate
context = self.text_encoder([input_prompt], self.device)[0].to(self.dtype)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\Desktop\pourwangp\ComfyUI_windows_portable\Wan2GP-main\models\wan\modules\t5.py", line 674, in call
seq_lens = mask.gt(0).sum(dim=1).long()
~~~~~~~^^^
File "C:\Users\user\Desktop\comfy\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\utils_device.py", line 103, in torch_function
return func(*args, **kwargs)
torch.AcceleratorError: CUDA error: out of memory
Search for cudaErrorMemoryAllocation' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.
Of course, the inference is still possible and quite fast, but I was wondering if I there's a bottleneck somewhere.
here is my configuration:
Total VRAM 16310 MB, total RAM 32691 MB
pytorch version: 2.9.1+cu130
Device: cuda:0 NVIDIA GeForce RTX 5060 Ti : cudaMallocAsync
Python version: 3.13.9 (tags/v3.13.9:8183fa5, Oct 14 2025, 14:09:13) [MSC v.1944 64 bit (AMD64)]