Skip to content

CLIP: Using CPU Backend despite configuring for CUDA #636

Open
@hamstared

Description

@hamstared

Whenever I use a weight that's not of F32 type, the program falls back to using CPU backend instead of continuing with CUDA, which then slows down the text encoding process a lot.

[DEBUG] stable-diffusion.cpp:165  - Using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: yes
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4070 Ti, compute capability 8.9, VMM: yes
[INFO ] stable-diffusion.cpp:197  - loading model from 'C:\...\stable-diffusion.cpp\models\sd3_medium_incl_clips_t5xxlfp16.safetensors'
[INFO ] model.cpp:908  - load C:\...\stable-diffusion.cpp\models\sd3_medium_incl_clips_t5xxlfp16.safetensors using safetensors format
[DEBUG] model.cpp:979  - init from 'C:\...\stable-diffusion.cpp\models\sd3_medium_incl_clips_t5xxlfp16.safetensors'
[INFO ] stable-diffusion.cpp:244  - Version: SD3.x
[INFO ] stable-diffusion.cpp:277  - Weight type:                 f16
[INFO ] stable-diffusion.cpp:278  - Conditioner weight type:     f16
[INFO ] stable-diffusion.cpp:279  - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:280  - VAE weight type:             f16
[DEBUG] stable-diffusion.cpp:282  - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:321  - set clip_on_cpu to true
[INFO ] stable-diffusion.cpp:324  - CLIP: Using CPU backend

My GPU doesn't have enough VRAM for me to force it to use F32 with --type F32, so my only option is to use F16 whenever I use SD3.x.

How can I make it so that it uses GPU during the text encoding process, so that it can be a lot faster?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions