-
Notifications
You must be signed in to change notification settings - Fork 504
Feat: Select backend devices via arg #1184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
094ac2e to
350df04
Compare
|
Maybe the backend #if tests on model.cpp, upscaler.cpp, etc. should be changed to runtime tests, too? Also: how hard would it be to support more than one backend with the same sd.cpp binaries - Vulkan and CUDA, for instance? |
Good point.
I think removing those #if tests and figuring out a way to build GGML with multipl backends should be enough? Edit: Actually I'm not sure if the #if tests in model.cpp are necessary at all. I could still build with Vulkan enabled when removing those. |
I believe it's leftover code. The SD_USE_FLASH_ATTENTION one on common.hpp (top one), qwen_image.hpp and z_image.hpp are trickier: they test for Vulkan for precision issues. z_image.hpp also has a |
|
I'm pretty sure ggml has runtime checks for the backend type. It would probably be better to use that instead. |
|
So sd.cpp actually support multi-backend builds? Like SYCL+CUDA at the same time? |
|
@CarlGao4 I'm not sure. I never sucessfully managed to build sd.cpp with multiple backends, but ggml should be able to handle that. I got it to build with both Vulkan and RPC, but it failed to send data to the RPC server, so I don't know if it would work with other backends (I had to add a way to connect to the RPC server via the CLI). Edit: actually RPC works if |
|
Support for multiple different backends can be achieved for third-party callers simply by switching the DLL/SO that supports the desired backend. |
|
@wbruna I got to reproduce the garbage at the end only once in my many tests. I'm not sure what's going on there. |
|
I've just realized I accidentaly included my RPC-related changes in the last commit. Since it's somewhat related should I leave them in, or should I keep that for a follow-up PR? |


The main goal of this PR is to improve user experience in multi-gpu setups, allowing to chose which model part gets sent to which device.
Cli changes:
--main-backend-device [device_name]argument to set the default backend--clip-on-cpu,--vae-on-cpuand--control-net-cpuarguments--clip_backend_device [device_name],--vae-backend-device [device_name],--control-net-backend-device [device_name]arguments--diffusion_backend_device(control the device used for the diffusion/flow models) and the--tae-backend-device--list-devicesargument to print the list of available ggml devices and exit.--rpcargument to connect to a compatible GGML rpc serverC API changes (stable-diffusion.h):
sd_ctx_params_tstruct.void list_backends_to_buffer(char* buffer, size_t buffer_size)to write the details of the available buffers to a null-terminated char array. Devices are separated by newline characters (\n), and the name and description of the device are separated by\tcharacter.size_t backend_list_size()to get the size of the buffer needed for void list_backends_to_buffervoid add_rpc_device(const char* address);connect to a ggml RPC backend (from llama.cpp)The default device selection should now consistently prioritize discrete GPUs over iGPUs.
For example if you want to run the text encoders on CPU, you'd need to use
--clip_backend_device CPUinstead of--clip-on-cpuTODOS:
Important: to use RPC, you need to add
-DGGML_RPC=ONto the build. Additionally it requires either sd.cpp to be built with-DSD_USE_SYSTEM_GGMLflag, or the RPC server to be built with-DCMAKE_C_FLAGS="-DGGML_MAX_NAME=128" -DCMAKE_CXX_FLAGS="-DGGML_MAX_NAME=128"(default is 64)Fixes #1116