Skip to content

Add gpu accel probe#953

Merged
aittalam merged 4 commits into
mainfrom
add-gpu-accel-probe
May 4, 2026
Merged

Add gpu accel probe#953
aittalam merged 4 commits into
mainfrom
add-gpu-accel-probe

Conversation

@aittalam
Copy link
Copy Markdown
Member

@aittalam aittalam commented May 1, 2026

Description

Added a probe to loop through the different libraries and check not just whether they load properly, but also whether they are able to see any GPU.

PR Type

  • 🐛 Bug Fix

Relevant issues

Fixing an issue found while testing for release.

Problem: if running the current code in main, if both ggml-rocm.so and ggml-cuda.so are available but ggml-rocm.so finds no GPUs, llamafile starts with ROCm supports and 0 GPUs loaded even if we have an NVIDIA one available. The probe code instead loops through all supported libs and then chooses only the ones that see at least 1 GPU, in this order: CUDA > ROCm > Vulkan.

Checklist

  • I understand the code I am submitting.
  • I have run this code locally and verified the change.
  • New and existing tests pass locally, or I have explained why tests were not run.
  • I have read and followed the contribution guidelines.
  • AI Usage:
    • No AI was used.
    • AI was used in an assistive capacity.
    • This PR includes substantial AI-generated content.

AI Usage Information

  • AI Model used: Opus 4.7
  • AI Developer Tool used: Claude Code

@aittalam
Copy link
Copy Markdown
Member Author

aittalam commented May 1, 2026

Code review

Found 1 issue:

  1. TryGpuBackend silently accepts a DSO without verifying device count when the DSO does not export ggml_backend_cuda_get_device_count. That symbol is imported as optional in LinkCuda (no ok &= assertion, "Optional - don't fail if not found"), so its function-pointer union stays NULL when the symbol is absent. The device-count probe is then guarded by if (g_cuda.get_device_count.default_abi || g_cuda.get_device_count.windows_abi), which causes the probe to be skipped entirely — TryGpuBackend falls through to g_cuda.is_amd = is_amd; return true; and registers a backend that may have zero devices. This silently reproduces the original 0-device-backend bug for any DSO missing that symbol (e.g. user-built DSOs in ~/.llamafile/). Consider failing closed (return false) when the symbol is missing, or making the symbol mandatory in LinkCuda.

llamafile/llamafile/cuda.c

Lines 188 to 208 in bb81581

// Verify the backend has at least one device before committing. The DSO
// loads fine even when no compatible hardware is present, so we must
// probe device count to avoid registering a 0-device backend (which
// would then prevent fallback to other GPU backends in AUTO mode).
if (g_cuda.get_device_count.default_abi || g_cuda.get_device_count.windows_abi) {
int count;
if (IsWindows())
count = g_cuda.get_device_count.windows_abi();
else
count = g_cuda.get_device_count.default_abi();
if (count <= 0) {
llamafile_info("cuda", "%s library loaded but no devices detected; trying next backend",
is_amd ? "ROCm" : "CUDA");
UnlinkCuda();
return false;
}
}
g_cuda.is_amd = is_amd;
return true;
}

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@aittalam
Copy link
Copy Markdown
Member Author

aittalam commented May 4, 2026

Fix looks good. Making ggml_backend_cuda_get_device_count mandatory in LinkCuda (with ok &= (sym != NULL)) and removing the symbol-presence guard around the probe in TryGpuBackend makes the check fail-closed: a DSO missing that symbol now fails to load instead of silently registering a 0-device backend.

🤖 Generated with Claude Code

@aittalam aittalam merged commit 0843312 into main May 4, 2026
2 checks passed
@aittalam aittalam deleted the add-gpu-accel-probe branch May 4, 2026 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant