Name and Version
Windows test case:
b9439 (22cadc1) (official Github release binary)
Adrenaline 26.5.2 and ROCm 7.2.4 on Windows 11 25H2 26200.7457
Linux test case
b9439 (22cadc1) (compiled myself)
ROCm 7.2.4 on Debian 13, with librockdxg 8dd7ed
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
llama-server, llama-cli
Command line
`llama-cli -hf (any model) -ngl all --fit on -lv4`
Problem description & steps to reproduce
ROCm gives different free memory amounts when natively on Windows or inside ROCDXG'd Linux.
Linux llama.cpp built using new librocdxg:
-
Install ROCm in your WSL2 Linux VM, and follow instructions. For Debian/Ubuntu, this is: add GPG key and apt repo, apt install rocm; you do not need amdgpu-dkms.
-
Follow ROCm Post-install instructions. For Debian/Ubuntu, you only need step 1, populate /etc/ld.so.conf.d/rocm.conf.
-
Install Windows SDK to default path.
-
To build librocdxg, run:
git clone https://github.com/ROCm/librocdxg.git
cd librocdxg
export win_sdk='/mnt/c/Program Files (x86)/Windows Kits/10/Include/10.0.28000.0/'
mkdir -p build
cd build
cmake .. -DWIN_SDK="${win_sdk}/shared"
make
sudo make install
-
Until ROCm 7.13 comes out, HSA_ENABLE_DXG_DETECTION=1 must be in your environment, so just export it now. Remember to export this in any new terminal, or add it to your ~/.bashrc.
-
rocminfo | grep Mar should list your GPU(s).
-
Actually build llama.cpp now; set target arch appropriately:
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
cmake -S . -B build -DGGML_HIP=ON -DGPU_TARGETS=gfx1100 -DCMAKE_BUILD_TYPE=Release \
&& cmake --build build --config Release -- -j 16
- Test llama.cpp, use any model, it just needs to run:
llama-cli -hf unsloth/Qwen3.6-27B-MTP-GGUF:Q5_K_XL -ngl all --fit on -lv 4
On my 7900XTX, in Windows:
ROCm0 (RX 7900 XTX) | 24560 = 24136
But in ROCDXG'd Linux:
ROCm0 (RX 7900 XTX) | 24517 = 21191 ~3GB less free.
First Bad Commit
N/A
Relevant log output
Logs
Native Windows:
0.01.999.610 I common_params_fit_impl: getting device memory data for initial parameters:
0.02.264.328 I common_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
0.02.264.333 I common_memory_breakdown_print: | - ROCm0 (RX 7900 XTX) | 24560 = 24136 + (35602 = 18563 + 16533 + 505) + -35179 |
0.02.264.333 I common_memory_breakdown_print: | - Host | 1109 = 833 + 0 + 276 |
0.02.304.398 I common_params_fit_impl: projected to use 35602 MiB of device memory vs. 24136 MiB of free device memory
0.02.304.403 I common_params_fit_impl: cannot meet free memory target of 2160 MiB, need to reduce device memory by 13625 MiB
0.02.550.717 I common_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
0.02.550.721 I common_memory_breakdown_print: | - ROCm0 (RX 7900 XTX) | 24560 = 24136 + (19478 = 18563 + 405 + 509) + -19055 |
0.02.550.721 I common_memory_breakdown_print: | - Host | 857 = 833 + 0 + 24 |
0.02.589.188 I common_params_fit_impl: context size reduced from 262144 to 44032 -> need 13628 MiB less memory in total
ROCDXG'd Linux:
0.04.851.240 I common_params_fit_impl: getting device memory data for initial parameters:
0.05.403.254 I common_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
0.05.403.275 I common_memory_breakdown_print: | - ROCm0 (RX 7900 XTX) | 24517 = 21191 + (35602 = 18563 + 16533 + 505) + -32276 |
0.05.403.275 I common_memory_breakdown_print: | - Host | 1109 = 833 + 0 + 276 |
0.05.433.198 I common_params_fit_impl: projected to use 35602 MiB of device memory vs. 21191 MiB of free device memory
0.05.433.219 I common_params_fit_impl: cannot meet free memory target of 2160 MiB, need to reduce device memory by 16571 MiB
0.05.966.005 I common_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
0.05.966.025 I common_memory_breakdown_print: | - ROCm0 (RX 7900 XTX) | 24517 = 21191 + (19478 = 18563 + 405 + 509) + -16152 |
0.05.966.026 I common_memory_breakdown_print: | - Host | 857 = 833 + 0 + 24 |
0.05.996.940 I common_params_fit_impl: context size reduced from 262144 to 4096 -> need 16123 MiB less memory in total
Name and Version
Windows test case:
b9439 (22cadc1) (official Github release binary)
Adrenaline 26.5.2 and ROCm 7.2.4 on Windows 11 25H2 26200.7457
Linux test case
b9439 (22cadc1) (compiled myself)
ROCm 7.2.4 on Debian 13, with librockdxg 8dd7ed
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
llama-server, llama-cli
Command line
`llama-cli -hf (any model) -ngl all --fit on -lv4`Problem description & steps to reproduce
ROCm gives different free memory amounts when natively on Windows or inside ROCDXG'd Linux.
Linux llama.cpp built using new librocdxg:
Install ROCm in your WSL2 Linux VM, and follow instructions. For Debian/Ubuntu, this is: add GPG key and apt repo,
apt install rocm; you do not needamdgpu-dkms.Follow ROCm Post-install instructions. For Debian/Ubuntu, you only need step 1, populate
/etc/ld.so.conf.d/rocm.conf.Install Windows SDK to default path.
To build librocdxg, run:
Until ROCm 7.13 comes out,
HSA_ENABLE_DXG_DETECTION=1must be in your environment, so justexportit now. Remember toexportthis in any new terminal, or add it to your~/.bashrc.rocminfo | grep Marshould list your GPU(s).Actually build llama.cpp now; set target arch appropriately:
On my 7900XTX, in Windows:
ROCm0 (RX 7900 XTX) | 24560 = 24136But in ROCDXG'd Linux:
ROCm0 (RX 7900 XTX) | 24517 = 21191~3GB less free.First Bad Commit
N/A
Relevant log output
Logs
Native Windows:ROCDXG'd Linux: