Fix uncaught SIGSEGV when GPU init fails, restore CPU fallback (#988)#989
Merged
Conversation
Factor CUDA/ROCm/Vulkan DSO probing into a shared core (gpu_backend.c): it gates registration on get_device_count() > 0 and runs the foreign probe call under a SIGSEGV/SIGABRT guard, so a 0-device or faulting GPU backend falls back to CPU instead of crashing (#988). Vulkan previously lacked the gate. Adds gpu_backend_test for the gate/fallback/fault paths and a short skill note. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the per-consumer lists of cuda/gpu_backend/metal/vulkan objects in whisperfile, diffusionfile and the llama.cpp tools with a single o/$(MODE)/llamafile/gpu.a defined once in llamafile/BUILD.mk, so adding a GPU backend source no longer means editing every consumer. Tidiness only: all consumers reference llamafile_has_gpu(), so every member is pulled and the binaries are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Member
Author
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
On Windows with an NVIDIA GPU that CUDA can't use, llamafile 0.10.2 crashed
with
Terminating on uncaught SIGSEGVinstead of falling back to CPU. This isa regression from 0.10.1 (where the same setup ran fine on CPU). Reported on a
GeForce MX130; reproduced conceptually for any machine where the bundled Vulkan
driver init fails.
Two distinct problems combined: (1) vulkan never got the device-count gate
we implemented in cuda.c and (2) the actual crash is a C++ exception that could not
cross the DSO boundary (i.e. it did not unwind across the
cosmo_dlopen/ms_abiboundarybut was surfaced as an uncaught SIGSEGV).
The fix consists of two changes:
a shared probe core, now stored in
llamafile/gpu_backend.c. Much code is very similar across the different GPU libraries (CUDA, ROCm, Vulkan) and we factored it down to avoid diverging. Metal is intentionally left separate (runtime-compiled, macOS-only, noms_abisplit, no device gate).a signal-based crash guard. The foreign probe call runs under a temporary
SIGSEGV/SIGABRT/… handler thatsiglongjmps back on a fault, converting a crash into a clean "backend unavailable → unlink → try next → CPU".A second commit groups the GPU loader objects into a
gpu.aarchive (tidiness only, binaries unchanged). No changes to the GPU dylibs or their build scripts — the symbol ABI the loader uses is unchanged, so existingggml-cuda/ggml-vulkanlibraries load as-is.Testing
Unit tests (
tests/gpu_backend_test.cpp, inmake check): thedevice-count gate (0/negative/missing-symbol → reject + unlink),
--verboselog suppression, register forwarding, and fault injection — a
get_device_countthat raises SIGSEGV/SIGABRT is caught and turned into aclean fallback, with normal probing still working afterward.
Real hardware / Linux (NVIDIA L40S, CUDA + Vulkan):
--gpu autoselectsCUDA,
--gpu vulkanselects Vulkan,--gpu disableuses CPU — all registerthe right device count and generate correctly, no crashes. Confirms the
refactor doesn't regress normal GPU operation and the crash guard doesn't
disturb a successful probe.
Real hardware / Windows (NVIDIA Quadro RTX 6000, CUDA + Vulkan): the same
happy paths verified, plus an end-to-end reproduction of Bug: uncaught SIGSEGV in llamafile-0.10.2 #988. With the
NVIDIA Vulkan ICD deliberately broken so
vkCreateInstancefails, stock0.10.2 dies with
Terminating on uncaught SIGSEGVat the Vulkan probe, whilethis branch logs "Vulkan crashed during device probe; trying next backend"
and falls back to CPU (exit 0, correct output). Same machine, same command,
only the binary differs — confirming the real
0xE06D7363C++ exception →cosmo SIGSEGV is caught by the guard, not just the synthetic fault in the
unit tests.
PR Type
Relevant issues
Closes #988
Checklist
llama.cpp/,whisper.cpp/, orstable-diffusion.cpp/, I also updated the matching*.patches/files.