Fix uncaught SIGSEGV when GPU init fails, restore CPU fallback (#988) by aittalam · Pull Request #989 · mozilla-ai/llamafile

aittalam · 2026-06-02T11:50:20Z

Description

On Windows with an NVIDIA GPU that CUDA can't use, llamafile 0.10.2 crashed
with Terminating on uncaught SIGSEGV instead of falling back to CPU. This is
a regression from 0.10.1 (where the same setup ran fine on CPU). Reported on a
GeForce MX130; reproduced conceptually for any machine where the bundled Vulkan
driver init fails.

Two distinct problems combined: (1) vulkan never got the device-count gate
we implemented in cuda.c and (2) the actual crash is a C++ exception that could not
cross the DSO boundary (i.e. it did not unwind across the cosmo_dlopen/ms_abi boundary
but was surfaced as an uncaught SIGSEGV).

The fix consists of two changes:

a shared probe core, now stored in llamafile/gpu_backend.c. Much code is very similar across the different GPU libraries (CUDA, ROCm, Vulkan) and we factored it down to avoid diverging. Metal is intentionally left separate (runtime-compiled, macOS-only, no ms_abi split, no device gate).
a signal-based crash guard. The foreign probe call runs under a temporary SIGSEGV/SIGABRT/… handler that siglongjmps back on a fault, converting a crash into a clean "backend unavailable → unlink → try next → CPU".

A second commit groups the GPU loader objects into a gpu.a archive (tidiness only, binaries unchanged). No changes to the GPU dylibs or their build scripts — the symbol ABI the loader uses is unchanged, so existing ggml-cuda/ggml-vulkan libraries load as-is.

Testing

Unit tests (tests/gpu_backend_test.cpp, in make check): the
device-count gate (0/negative/missing-symbol → reject + unlink), --verbose
log suppression, register forwarding, and fault injection — a
get_device_count that raises SIGSEGV/SIGABRT is caught and turned into a
clean fallback, with normal probing still working afterward.
Real hardware / Linux (NVIDIA L40S, CUDA + Vulkan): --gpu auto selects
CUDA, --gpu vulkan selects Vulkan, --gpu disable uses CPU — all register
the right device count and generate correctly, no crashes. Confirms the
refactor doesn't regress normal GPU operation and the crash guard doesn't
disturb a successful probe.
Real hardware / Windows (NVIDIA Quadro RTX 6000, CUDA + Vulkan): the same
happy paths verified, plus an end-to-end reproduction of Bug: uncaught SIGSEGV in llamafile-0.10.2 #988. With the
NVIDIA Vulkan ICD deliberately broken so vkCreateInstance fails, stock
0.10.2 dies with Terminating on uncaught SIGSEGV at the Vulkan probe, while
this branch logs "Vulkan crashed during device probe; trying next backend"
and falls back to CPU (exit 0, correct output). Same machine, same command,
only the binary differs — confirming the real 0xE06D7363 C++ exception →
cosmo SIGSEGV is caught by the guard, not just the synthetic fault in the
unit tests.

PR Type

🐛 Bug Fix

Relevant issues

Closes #988

Checklist

Factor CUDA/ROCm/Vulkan DSO probing into a shared core (gpu_backend.c): it gates registration on get_device_count() > 0 and runs the foreign probe call under a SIGSEGV/SIGABRT guard, so a 0-device or faulting GPU backend falls back to CPU instead of crashing (#988). Vulkan previously lacked the gate. Adds gpu_backend_test for the gate/fallback/fault paths and a short skill note. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replace the per-consumer lists of cuda/gpu_backend/metal/vulkan objects in whisperfile, diffusionfile and the llama.cpp tools with a single o/$(MODE)/llamafile/gpu.a defined once in llamafile/BUILD.mk, so adding a GPU backend source no longer means editing every consumer. Tidiness only: all consumers reference llamafile_has_gpu(), so every member is pulled and the binaries are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

aittalam · 2026-06-02T12:11:12Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

aittalam and others added 3 commits June 2, 2026 10:32

Bumped version to 0.10.3

0774df9

github-actions Bot added documentation llamafile testing labels Jun 2, 2026

Merge branch 'main' into fix-issue-988

269e9d5

Fix misleading SA_NODEFER/savesigs comment in gpu_run_guarded

4de0da3

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

aittalam merged commit 6e28ad2 into main Jun 2, 2026
3 checks passed

aittalam deleted the fix-issue-988 branch June 2, 2026 12:16

aittalam mentioned this pull request Jun 4, 2026

Probe GPU device count out-of-process on Windows (#988 follow-up) #994

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix uncaught SIGSEGV when GPU init fails, restore CPU fallback (#988)#989

Fix uncaught SIGSEGV when GPU init fails, restore CPU fallback (#988)#989
aittalam merged 5 commits into
mainfrom
fix-issue-988

aittalam commented Jun 2, 2026

Uh oh!

aittalam commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aittalam commented Jun 2, 2026

Description

Testing

PR Type

Relevant issues

Checklist

Uh oh!

aittalam commented Jun 2, 2026

Code review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant