Add gpu accel probe by aittalam · Pull Request #953 · mozilla-ai/llamafile

aittalam · 2026-05-01T10:52:01Z

Description

Added a probe to loop through the different libraries and check not just whether they load properly, but also whether they are able to see any GPU.

PR Type

🐛 Bug Fix

Relevant issues

Fixing an issue found while testing for release.

Problem: if running the current code in main, if both ggml-rocm.so and ggml-cuda.so are available but ggml-rocm.so finds no GPUs, llamafile starts with ROCm supports and 0 GPUs loaded even if we have an NVIDIA one available. The probe code instead loops through all supported libs and then chooses only the ones that see at least 1 GPU, in this order: CUDA > ROCm > Vulkan.

Checklist

I understand the code I am submitting.
I have run this code locally and verified the change.
New and existing tests pass locally, or I have explained why tests were not run.
I have read and followed the contribution guidelines.
AI Usage:
- No AI was used.
- AI was used in an assistive capacity.
- This PR includes substantial AI-generated content.

AI Usage Information

AI Model used: Opus 4.7
AI Developer Tool used: Claude Code

aittalam · 2026-05-01T15:29:35Z

Code review

Found 1 issue:

TryGpuBackend silently accepts a DSO without verifying device count when the DSO does not export ggml_backend_cuda_get_device_count. That symbol is imported as optional in LinkCuda (no ok &= assertion, "Optional - don't fail if not found"), so its function-pointer union stays NULL when the symbol is absent. The device-count probe is then guarded by if (g_cuda.get_device_count.default_abi || g_cuda.get_device_count.windows_abi), which causes the probe to be skipped entirely — TryGpuBackend falls through to g_cuda.is_amd = is_amd; return true; and registers a backend that may have zero devices. This silently reproduces the original 0-device-backend bug for any DSO missing that symbol (e.g. user-built DSOs in ~/.llamafile/). Consider failing closed (return false) when the symbol is missing, or making the symbol mandatory in LinkCuda.

llamafile/llamafile/cuda.c

Lines 188 to 208 in bb81581

    
               // Verify the backend has at least one device before committing. The DSO 
        
               // loads fine even when no compatible hardware is present, so we must 
        
               // probe device count to avoid registering a 0-device backend (which 
        
               // would then prevent fallback to other GPU backends in AUTO mode). 
        
               if (g_cuda.get_device_count.default_abi || g_cuda.get_device_count.windows_abi) { 
        
                   int count; 
        
                   if (IsWindows()) 
        
                       count = g_cuda.get_device_count.windows_abi(); 
        
                   else 
        
                       count = g_cuda.get_device_count.default_abi(); 
        
                   if (count <= 0) { 
        
                       llamafile_info("cuda", "%s library loaded but no devices detected; trying next backend", 
        
                                      is_amd ? "ROCm" : "CUDA"); 
        
                       UnlinkCuda(); 
        
                       return false; 
        
                   } 
        
               } 
        
               g_cuda.is_amd = is_amd; 
        
               return true; 
        
           }

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

aittalam · 2026-05-04T10:07:43Z

Fix looks good. Making ggml_backend_cuda_get_device_count mandatory in LinkCuda (with ok &= (sym != NULL)) and removing the symbol-presence guard around the probe in TryGpuBackend makes the check fail-closed: a DSO missing that symbol now fails to load instead of silently registering a 0-device backend.

🤖 Generated with Claude Code

aittalam added 2 commits April 30, 2026 11:26

New release version: update version.h and docs

c250d5c

Added probe for GPU libs

bb81581

github-actions Bot added documentation llamafile labels May 1, 2026

aittalam added 2 commits May 4, 2026 10:16

Merge branch 'main' into add-gpu-accel-probe

43f66e6

Addressed PR review

e398c02

aittalam removed the documentation label May 4, 2026

aittalam merged commit 0843312 into main May 4, 2026
2 checks passed

aittalam deleted the add-gpu-accel-probe branch May 4, 2026 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gpu accel probe#953

Add gpu accel probe#953
aittalam merged 4 commits into
mainfrom
add-gpu-accel-probe

aittalam commented May 1, 2026

Uh oh!

aittalam commented May 1, 2026

Uh oh!

aittalam commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aittalam commented May 1, 2026

Description

PR Type

Relevant issues

Checklist

AI Usage Information

Uh oh!

aittalam commented May 1, 2026

Code review

Uh oh!

aittalam commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant