Open
Description
Whenever I use a weight that's not of F32 type, the program falls back to using CPU backend instead of continuing with CUDA, which then slows down the text encoding process a lot.
[DEBUG] stable-diffusion.cpp:165 - Using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: yes
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4070 Ti, compute capability 8.9, VMM: yes
[INFO ] stable-diffusion.cpp:197 - loading model from 'C:\...\stable-diffusion.cpp\models\sd3_medium_incl_clips_t5xxlfp16.safetensors'
[INFO ] model.cpp:908 - load C:\...\stable-diffusion.cpp\models\sd3_medium_incl_clips_t5xxlfp16.safetensors using safetensors format
[DEBUG] model.cpp:979 - init from 'C:\...\stable-diffusion.cpp\models\sd3_medium_incl_clips_t5xxlfp16.safetensors'
[INFO ] stable-diffusion.cpp:244 - Version: SD3.x
[INFO ] stable-diffusion.cpp:277 - Weight type: f16
[INFO ] stable-diffusion.cpp:278 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:279 - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:280 - VAE weight type: f16
[DEBUG] stable-diffusion.cpp:282 - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:321 - set clip_on_cpu to true
[INFO ] stable-diffusion.cpp:324 - CLIP: Using CPU backend
My GPU doesn't have enough VRAM for me to force it to use F32 with --type F32
, so my only option is to use F16 whenever I use SD3.x.
How can I make it so that it uses GPU during the text encoding process, so that it can be a lot faster?
Metadata
Metadata
Assignees
Labels
No labels