Skip to content

Conversation

@stduhpf
Copy link
Contributor

@stduhpf stduhpf commented Jan 9, 2026

The main goal of this PR is to improve user experience in multi-gpu setups, allowing to chose which model part gets sent to which device.

Cli changes:

  • Adds the --main-backend-device [device_name] argument to set the default backend
  • remove --clip-on-cpu, --vae-on-cpu and --control-net-cpu arguments
  • replace them respectively with the new --clip_backend_device [device_name], --vae-backend-device [device_name], --control-net-backend-device [device_name] arguments
  • add the --diffusion_backend_device (control the device used for the diffusion/flow models) and the --tae-backend-device
  • add --list-devices argument to print the list of available ggml devices and exit.
  • add --rpc argument to connect to a compatible GGML rpc server

C API changes (stable-diffusion.h):

  • Change the content of the sd_ctx_params_t struct.
  • void list_backends_to_buffer(char* buffer, size_t buffer_size) to write the details of the available buffers to a null-terminated char array. Devices are separated by newline characters (\n), and the name and description of the device are separated by \t character.
  • size_t backend_list_size() to get the size of the buffer needed for void list_backends_to_buffer
  • void add_rpc_device(const char* address); connect to a ggml RPC backend (from llama.cpp)

The default device selection should now consistently prioritize discrete GPUs over iGPUs.

For example if you want to run the text encoders on CPU, you'd need to use --clip_backend_device CPU instead of --clip-on-cpu

TODOS:

  • Different devices for different text encoders? (for models like SDXL / SD3.x / Flux.1)
  • Device for upscaler, photomaker and Vision models

Important: to use RPC, you need to add -DGGML_RPC=ON to the build. Additionally it requires either sd.cpp to be built with -DSD_USE_SYSTEM_GGML flag, or the RPC server to be built with -DCMAKE_C_FLAGS="-DGGML_MAX_NAME=128" -DCMAKE_CXX_FLAGS="-DGGML_MAX_NAME=128" (default is 64)

Fixes #1116

@wbruna
Copy link
Contributor

wbruna commented Jan 9, 2026

Maybe the backend #if tests on model.cpp, upscaler.cpp, etc. should be changed to runtime tests, too?

Also: how hard would it be to support more than one backend with the same sd.cpp binaries - Vulkan and CUDA, for instance?

@stduhpf
Copy link
Contributor Author

stduhpf commented Jan 9, 2026

Maybe the backend #if tests on model.cpp, upscaler.cpp, etc. should be changed to runtime tests, too?

Good point.

Also: how hard would it be to support more than one backend with the same sd.cpp binaries - Vulkan and CUDA, for instance?

I think removing those #if tests and figuring out a way to build GGML with multipl backends should be enough?

Edit: Actually I'm not sure if the #if tests in model.cpp are necessary at all. I could still build with Vulkan enabled when removing those.

@wbruna
Copy link
Contributor

wbruna commented Jan 9, 2026

Edit: Actually I'm not sure if the #if tests in model.cpp are necessary at all. I could still build with Vulkan enabled when removing those.

I believe it's leftover code. The SD_USE_FLASH_ATTENTION one on ggml_extend.h seems obsolete, too.

common.hpp (top one), qwen_image.hpp and z_image.hpp are trickier: they test for Vulkan for precision issues. z_image.hpp also has a #if GGML_USE_HIP for the same reason (this one is my fault 🙂).

@stduhpf
Copy link
Contributor Author

stduhpf commented Jan 9, 2026

I'm pretty sure ggml has runtime checks for the backend type. It would probably be better to use that instead.

@CarlGao4
Copy link
Contributor

So sd.cpp actually support multi-backend builds? Like SYCL+CUDA at the same time?

@stduhpf
Copy link
Contributor Author

stduhpf commented Jan 10, 2026

@CarlGao4 I'm not sure. I never sucessfully managed to build sd.cpp with multiple backends, but ggml should be able to handle that. I got it to build with both Vulkan and RPC, but it failed to send data to the RPC server, so I don't know if it would work with other backends (I had to add a way to connect to the RPC server via the CLI).

Edit: actually RPC works if GGML_MAX_NAME is set to the same value for both sd-cli and rpc-server

@leejet
Copy link
Owner

leejet commented Jan 11, 2026

Support for multiple different backends can be achieved for third-party callers simply by switching the DLL/SO that supports the desired backend.

@wbruna
Copy link
Contributor

wbruna commented Jan 11, 2026

--list-devices working on Linux, both on Vulkan and ROCm. But I'm getting a bit of garbage at the end of the Vulkan output:

Details image image

@stduhpf
Copy link
Contributor Author

stduhpf commented Jan 12, 2026

@wbruna I got to reproduce the garbage at the end only once in my many tests. I'm not sure what's going on there.

@stduhpf
Copy link
Contributor Author

stduhpf commented Jan 12, 2026

I've just realized I accidentaly included my RPC-related changes in the last commit. Since it's somewhat related should I leave them in, or should I keep that for a follow-up PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Supports specifying backend

4 participants