Skip to content

Misc. bug: ROCm has new DXG connector for WSL2, gives weird free memory amounts (probably not llama.cpp bug, but worth documenting) #23999

@Diablo-D3

Description

@Diablo-D3

Name and Version

Windows test case:
b9439 (22cadc1) (official Github release binary)
Adrenaline 26.5.2 and ROCm 7.2.4 on Windows 11 25H2 26200.7457

Linux test case
b9439 (22cadc1) (compiled myself)
ROCm 7.2.4 on Debian 13, with librockdxg 8dd7ed

Operating systems

Windows

Which llama.cpp modules do you know to be affected?

llama-server, llama-cli

Command line

`llama-cli -hf (any model) -ngl all --fit on -lv4`

Problem description & steps to reproduce

ROCm gives different free memory amounts when natively on Windows or inside ROCDXG'd Linux.

Linux llama.cpp built using new librocdxg:

  1. Install ROCm in your WSL2 Linux VM, and follow instructions. For Debian/Ubuntu, this is: add GPG key and apt repo, apt install rocm; you do not need amdgpu-dkms.

  2. Follow ROCm Post-install instructions. For Debian/Ubuntu, you only need step 1, populate /etc/ld.so.conf.d/rocm.conf.

  3. Install Windows SDK to default path.

  4. To build librocdxg, run:

git clone https://github.com/ROCm/librocdxg.git
cd librocdxg

export win_sdk='/mnt/c/Program Files (x86)/Windows Kits/10/Include/10.0.28000.0/'
mkdir -p build
cd build
cmake .. -DWIN_SDK="${win_sdk}/shared"
make
sudo make install
  1. Until ROCm 7.13 comes out, HSA_ENABLE_DXG_DETECTION=1 must be in your environment, so just export it now. Remember to export this in any new terminal, or add it to your ~/.bashrc.

  2. rocminfo | grep Mar should list your GPU(s).

  3. Actually build llama.cpp now; set target arch appropriately:

HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
    cmake -S . -B build -DGGML_HIP=ON -DGPU_TARGETS=gfx1100 -DCMAKE_BUILD_TYPE=Release \
    && cmake --build build --config Release -- -j 16
  1. Test llama.cpp, use any model, it just needs to run:
llama-cli -hf unsloth/Qwen3.6-27B-MTP-GGUF:Q5_K_XL -ngl all --fit on -lv 4

On my 7900XTX, in Windows:
ROCm0 (RX 7900 XTX) | 24560 = 24136
But in ROCDXG'd Linux:
ROCm0 (RX 7900 XTX) | 24517 = 21191 ~3GB less free.

First Bad Commit

N/A

Relevant log output

Logs Native Windows:
0.01.999.610 I common_params_fit_impl: getting device memory data for initial parameters:
0.02.264.328 I common_memory_breakdown_print: | memory breakdown [MiB]  | total    free     self   model   context   compute    unaccounted |
0.02.264.333 I common_memory_breakdown_print: |   - ROCm0 (RX 7900 XTX) | 24560 = 24136 + (35602 = 18563 +   16533 +     505) +      -35179 |
0.02.264.333 I common_memory_breakdown_print: |   - Host                |                   1109 =   833 +       0 +     276                |
0.02.304.398 I common_params_fit_impl: projected to use 35602 MiB of device memory vs. 24136 MiB of free device memory
0.02.304.403 I common_params_fit_impl: cannot meet free memory target of 2160 MiB, need to reduce device memory by 13625 MiB                                                            
0.02.550.717 I common_memory_breakdown_print: | memory breakdown [MiB]  | total    free     self   model   context   compute    unaccounted |
0.02.550.721 I common_memory_breakdown_print: |   - ROCm0 (RX 7900 XTX) | 24560 = 24136 + (19478 = 18563 +     405 +     509) +      -19055 |
0.02.550.721 I common_memory_breakdown_print: |   - Host                |                    857 =   833 +       0 +      24                |
0.02.589.188 I common_params_fit_impl: context size reduced from 262144 to 44032 -> need 13628 MiB less memory in total

ROCDXG'd Linux:

0.04.851.240 I common_params_fit_impl: getting device memory data for initial parameters:
0.05.403.254 I common_memory_breakdown_print: | memory breakdown [MiB]  | total    free     self   model   context   compute    unaccounted |
0.05.403.275 I common_memory_breakdown_print: |   - ROCm0 (RX 7900 XTX) | 24517 = 21191 + (35602 = 18563 +   16533 +     505) +      -32276 |
0.05.403.275 I common_memory_breakdown_print: |   - Host                |                   1109 =   833 +       0 +     276                |
0.05.433.198 I common_params_fit_impl: projected to use 35602 MiB of device memory vs. 21191 MiB of free device memory
0.05.433.219 I common_params_fit_impl: cannot meet free memory target of 2160 MiB, need to reduce device memory by 16571 MiB
0.05.966.005 I common_memory_breakdown_print: | memory breakdown [MiB]  | total    free     self   model   context   compute    unaccounted |
0.05.966.025 I common_memory_breakdown_print: |   - ROCm0 (RX 7900 XTX) | 24517 = 21191 + (19478 = 18563 +     405 +     509) +      -16152 |
0.05.966.026 I common_memory_breakdown_print: |   - Host                |                    857 =   833 +       0 +      24                |
0.05.996.940 I common_params_fit_impl: context size reduced from 262144 to 4096 -> need 16123 MiB less memory in total

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions