Skip to content

Fix uncaught SIGSEGV when GPU init fails, restore CPU fallback (#988)#989

Merged
aittalam merged 5 commits into
mainfrom
fix-issue-988
Jun 2, 2026
Merged

Fix uncaught SIGSEGV when GPU init fails, restore CPU fallback (#988)#989
aittalam merged 5 commits into
mainfrom
fix-issue-988

Conversation

@aittalam

@aittalam aittalam commented Jun 2, 2026

Copy link
Copy Markdown
Member

Description

On Windows with an NVIDIA GPU that CUDA can't use, llamafile 0.10.2 crashed
with Terminating on uncaught SIGSEGV instead of falling back to CPU. This is
a regression from 0.10.1 (where the same setup ran fine on CPU). Reported on a
GeForce MX130; reproduced conceptually for any machine where the bundled Vulkan
driver init fails.

Two distinct problems combined: (1) vulkan never got the device-count gate
we implemented in cuda.c and (2) the actual crash is a C++ exception that could not
cross the DSO boundary (i.e. it did not unwind across the cosmo_dlopen/ms_abi boundary
but was surfaced as an uncaught SIGSEGV).

The fix consists of two changes:

  • a shared probe core, now stored in llamafile/gpu_backend.c. Much code is very similar across the different GPU libraries (CUDA, ROCm, Vulkan) and we factored it down to avoid diverging. Metal is intentionally left separate (runtime-compiled, macOS-only, no ms_abi split, no device gate).

  • a signal-based crash guard. The foreign probe call runs under a temporary SIGSEGV/SIGABRT/… handler that siglongjmps back on a fault, converting a crash into a clean "backend unavailable → unlink → try next → CPU".

A second commit groups the GPU loader objects into a gpu.a archive (tidiness only, binaries unchanged). No changes to the GPU dylibs or their build scripts — the symbol ABI the loader uses is unchanged, so existing ggml-cuda/ggml-vulkan libraries load as-is.

Testing

  • Unit tests (tests/gpu_backend_test.cpp, in make check): the
    device-count gate (0/negative/missing-symbol → reject + unlink), --verbose
    log suppression, register forwarding, and fault injection — a
    get_device_count that raises SIGSEGV/SIGABRT is caught and turned into a
    clean fallback, with normal probing still working afterward.

  • Real hardware / Linux (NVIDIA L40S, CUDA + Vulkan): --gpu auto selects
    CUDA, --gpu vulkan selects Vulkan, --gpu disable uses CPU — all register
    the right device count and generate correctly, no crashes. Confirms the
    refactor doesn't regress normal GPU operation and the crash guard doesn't
    disturb a successful probe.

  • Real hardware / Windows (NVIDIA Quadro RTX 6000, CUDA + Vulkan): the same
    happy paths verified, plus an end-to-end reproduction of Bug: uncaught SIGSEGV in llamafile-0.10.2 #988. With the
    NVIDIA Vulkan ICD deliberately broken so vkCreateInstance fails, stock
    0.10.2 dies with Terminating on uncaught SIGSEGV at the Vulkan probe, while
    this branch logs "Vulkan crashed during device probe; trying next backend"
    and falls back to CPU (exit 0, correct output). Same machine, same command,
    only the binary differs — confirming the real 0xE06D7363 C++ exception →
    cosmo SIGSEGV is caught by the guard, not just the synthetic fault in the
    unit tests.

PR Type

  • 🐛 Bug Fix

Relevant issues

Closes #988

Checklist

  • I understand the code I am submitting.
  • I have run this code locally and verified the change.
  • New and existing tests pass locally, or I have explained why tests were not run.
  • Documentation was updated where necessary.
  • If I changed code in llama.cpp/, whisper.cpp/, or stable-diffusion.cpp/, I also updated the matching *.patches/ files.
  • I have read and followed the contribution guidelines.
  • AI Usage:
    • No AI was used.
    • AI was used in an assistive capacity.
    • This PR includes substantial AI-generated content.

aittalam and others added 3 commits June 2, 2026 10:32
Factor CUDA/ROCm/Vulkan DSO probing into a shared core (gpu_backend.c):
it gates registration on get_device_count() > 0 and runs the foreign
probe call under a SIGSEGV/SIGABRT guard, so a 0-device or faulting GPU
backend falls back to CPU instead of crashing (#988). Vulkan previously
lacked the gate. Adds gpu_backend_test for the gate/fallback/fault paths
and a short skill note.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the per-consumer lists of cuda/gpu_backend/metal/vulkan objects in
whisperfile, diffusionfile and the llama.cpp tools with a single
o/$(MODE)/llamafile/gpu.a defined once in llamafile/BUILD.mk, so adding a GPU
backend source no longer means editing every consumer. Tidiness only: all
consumers reference llamafile_has_gpu(), so every member is pulled and the
binaries are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@aittalam

aittalam commented Jun 2, 2026

Copy link
Copy Markdown
Member Author

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@aittalam aittalam merged commit 6e28ad2 into main Jun 2, 2026
3 checks passed
@aittalam aittalam deleted the fix-issue-988 branch June 2, 2026 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: uncaught SIGSEGV in llamafile-0.10.2

1 participant