Description
I have a ROCm compiled with support for both the discrete GPU and the iGPU, but with HIP_VISIBLE_DEVICES
set to 0
to ensure only the discrete GPU is considered (the iGPU is just for experimenting, it's far too slow to meaningfully use). But because rocminfo
and rocm-smi
list both GPUs, setting gpulayers
to -1
to trigger the automatic calculation uses the reported iGPU memory. This is wrong/suboptimal in two respects:
- The "reported memory" for the iGPU takes only the reserved GPU memory into account, but compute memory actually ends up as GTT memory, meaning all of system memory is available rather than the few piddly MB for video graphics (from Linux 6.10 onwards at least, but I've also observed this happening on 6.9). To get any kind of speed out of this you need to compile GGML with
GGML_HIP_UMA
, though, which tanks perf on discrete GPUs -- but I digress. - With
HIP_VISIBLE_DEVICES
set, the iGPU is actually not used for inference at all, yet its lower memory is used, causing the calculation to always offload 0 layers.
If I ugly hack my local koboldcpp.py to simply ignore any devices beyond the first the auto-layer calculation does its job correctly. I'm too lazy to write a real patch fixing the problem, I just wanted to mention it here. Taking HIP_VISIBLE_DEVICES
(and/or CUDA_VISIBLE_DEVICES
, which AMD supports for compatibility) should not be particularly hard. Taking the iGPU reported memory thing into account is probably too complicated (AFAIK there isn't even a reliable way to detect it's an iGPU; things like the marketing name being "AMD Radeon Graphics" and the uuid being "GPU-XX" are certainly suggestive, but hardly convincing). If you're using an iGPU (or God forbid a combination) you probably don't want to rely on the auto-layer feature anyway.
I don't know if Nvidia would have similar issues with CUDA_VISIBLE_DEVICES
active.