I wasn't able to run the main-cuda docker image (whisper-server) on my system.
I used claude code to help me debug (I am no expert in c++ or compilation), but it seems that the image is build using some compilation that eventually requires certain types or CPU during runtime.
(main (cpu) docker image works well on my system)
Here is the (claude generated) report:
Environment
|
|
| Image |
ghcr.io/ggml-org/whisper.cpp:main-cuda (built 2026-05-15) |
| Host OS |
Ubuntu, Linux 6.8.0-117-generic x86_64 |
| CPU |
AMD Ryzen 9 5950X (Zen 3) — AVX, AVX2, FMA, BMI2 — no AVX-512, no AMX |
| GPU |
NVIDIA GeForce RTX 4090 (24 GB VRAM, compute capability 8.9) |
| NVIDIA driver |
580.159.04 |
| CUDA (host) |
13.0 |
| Container runtime |
Docker with NVIDIA Container Toolkit |
| Model |
ggml-large-v3-turbo.bin |
Bug 1 — LD_LIBRARY_PATH shadows the host-injected libcuda.so, causing CUDA_ERROR_SYSTEM_DRIVER_MISMATCH
The image sets LD_LIBRARY_PATH with /usr/local/cuda-13.0/compat as the first entry:
/usr/local/cuda-13.0/compat:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64
The cuda-compat-13-0 package installed in the image is version 580.65.06-0ubuntu1, so /usr/local/cuda-13.0/compat/libcuda.so.580.65.06 is loaded before the libcuda.so.580.159.04 injected by the NVIDIA Container Toolkit.
Result on startup:
ggml_cuda_init: failed to initialize CUDA: system has unsupported display driver / cuda driver combination
Workaround: prepend the real library paths in the container environment:
environment:
- LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64
With this fix, CUDA initialises correctly and the RTX 4090 is detected:
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 24063 MiB):
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes, VRAM: 24063 MiB
Bug 2 — SIGILL (exit code 132) on AMD Zen 3 after model load
After applying the LD_LIBRARY_PATH fix, the process still crashes with Illegal instruction (core dumped) immediately after model loading completes:
whisper_model_load: n_langs = 100
Illegal instruction (core dumped)
This also reproduces with CUDA_VISIBLE_DEVICES="" (pure CPU path), ruling out a GPU-side issue.
Findings:
- The
whisper-server binary itself declares x86 ISA needed: x86-64-baseline (via readelf -n)
libggml-cpu.so.0 exports ggml_cpu_has_avx512, ggml_cpu_has_avx512_vbmi, ggml_cpu_has_avx512_vnni, ggml_cpu_has_avx512_bf16, and ggml_cpu_has_amx_int8, with amx.cpp compiled in
- The Ryzen 9 5950X has no AVX-512 and no AMX
- The crash occurs at the point where ggml initialises compute buffers / backend after model loading — consistent with a call path that emits AVX-512 or AMX instructions without a proper CPU feature guard
The main (CPU-only) image built from the same date does not exhibit this crash and runs the same model to a healthy HTTP server on the same machine.
This suggests libggml-cpu.so.0 in the main-cuda build was compiled with -march= flags that include AVX-512 or AMX, or that the AMX initialisation path (amx.cpp) is entered unconditionally regardless of the runtime CPU feature check.
Expected behaviour
whisper-server should start and serve requests on any x86-64 CPU that supports the baseline ISA, falling back gracefully when AVX-512/AMX are unavailable.
Actual behaviour
main-cuda is unusable on AMD Zen 3 and any other CPU without AVX-512/AMX.
Workaround
Use the ghcr.io/ggml-org/whisper.cpp:main (CPU-only) image with the full binary path as entrypoint:
image: ghcr.io/ggml-org/whisper.cpp:main
entrypoint: ["/app/build/bin/whisper-server", "--model", "...", "--host", "0.0.0.0", "--port", "9000"]
I wasn't able to run the main-cuda docker image (whisper-server) on my system.
I used claude code to help me debug (I am no expert in c++ or compilation), but it seems that the image is build using some compilation that eventually requires certain types or CPU during runtime.
(main (cpu) docker image works well on my system)
Here is the (claude generated) report:
Environment
ghcr.io/ggml-org/whisper.cpp:main-cuda(built 2026-05-15)ggml-large-v3-turbo.binBug 1 —
LD_LIBRARY_PATHshadows the host-injectedlibcuda.so, causingCUDA_ERROR_SYSTEM_DRIVER_MISMATCHThe image sets
LD_LIBRARY_PATHwith/usr/local/cuda-13.0/compatas the first entry:The
cuda-compat-13-0package installed in the image is version580.65.06-0ubuntu1, so/usr/local/cuda-13.0/compat/libcuda.so.580.65.06is loaded before thelibcuda.so.580.159.04injected by the NVIDIA Container Toolkit.Result on startup:
Workaround: prepend the real library paths in the container environment:
With this fix, CUDA initialises correctly and the RTX 4090 is detected:
Bug 2 — SIGILL (exit code 132) on AMD Zen 3 after model load
After applying the LD_LIBRARY_PATH fix, the process still crashes with
Illegal instruction (core dumped)immediately after model loading completes:This also reproduces with
CUDA_VISIBLE_DEVICES=""(pure CPU path), ruling out a GPU-side issue.Findings:
whisper-serverbinary itself declaresx86 ISA needed: x86-64-baseline(viareadelf -n)libggml-cpu.so.0exportsggml_cpu_has_avx512,ggml_cpu_has_avx512_vbmi,ggml_cpu_has_avx512_vnni,ggml_cpu_has_avx512_bf16, andggml_cpu_has_amx_int8, withamx.cppcompiled inThe
main(CPU-only) image built from the same date does not exhibit this crash and runs the same model to a healthy HTTP server on the same machine.This suggests
libggml-cpu.so.0in themain-cudabuild was compiled with-march=flags that include AVX-512 or AMX, or that the AMX initialisation path (amx.cpp) is entered unconditionally regardless of the runtime CPU feature check.Expected behaviour
whisper-servershould start and serve requests on any x86-64 CPU that supports the baseline ISA, falling back gracefully when AVX-512/AMX are unavailable.Actual behaviour
main-cudais unusable on AMD Zen 3 and any other CPU without AVX-512/AMX.Workaround
Use the
ghcr.io/ggml-org/whisper.cpp:main(CPU-only) image with the full binary path as entrypoint: