NULL-deref in sycl::detail::getSortedImages — strcmp on entry name without null-check
Summary
sycl::detail::getSortedImages() in persistent_device_code_cache.cpp:99-107 crashes with
SIGSEGV (strcmp(NULL, ...)) when one of the RTDeviceBinaryImage entries it sorts has a
NULL EntriesBegin->GetName(). Triggered reliably when an application loads a SYCL kernel via
a dynamically-linked .so and requests JIT compilation with SYCL_CACHE_PERSISTENT=1 (the
default in many downstreams).
The same code, statically linked, works fine — confirming that the NULL Name is produced
when an RTDeviceBinaryImage is registered from a dlopen'd shared library. The cache
sort lambda assumes every image has a non-empty entry table with a named first entry, which
is not invariant.
Affected source
sycl/source/detail/persistent_device_code_cache.cpp (master branch, also at
ab3dc98de0fd1ada9df12b138de1e1f8b715cc27):
static std::vector<const RTDeviceBinaryImage *>
getSortedImages(const std::vector<const RTDeviceBinaryImage *> &Imgs) {
std::vector<const RTDeviceBinaryImage *> SortedImgs = Imgs;
std::sort(SortedImgs.begin(), SortedImgs.end(),
[](const RTDeviceBinaryImage *A, const RTDeviceBinaryImage *B) {
// All entry names are unique among these images, so comparing the
// first ones is enough.
return std::strcmp(A->getRawData().EntriesBegin->GetName(),
B->getRawData().EntriesBegin->GetName()) < 0;
});
return SortedImgs;
}
The comment "All entry names are unique among these images, so comparing the first ones is
enough" relies on an unverified invariant. Encountered in real workloads where
EntriesBegin->GetName() is NULL.
Backtrace (libunwind, from a real crash)
Captured via LD_PRELOAD SIGSEGV handler with libunwind on Intel Arc Pro B70 (BMG-G31,
PCI 8086:e223), GGML SYCL backend at first ggml_sycl_op_get_rows JIT:
=== SIGSEGV pid=63802 addr=(nil) ===
RIP=74aea4eb6bab RDI=8000000000000000 RSI=74ae4d29c490 ...
#0 _ZSt16__insertion_sort<...>+0xcc in libsycl.so.8+0x33816c
#1 sycl::_V1::detail::getSortedImages(...)+0x17c in libsycl.so.8+0x33909c
#2 sycl::_V1::detail::PersistentDeviceCodeCache::
getItemFromDisc(...)+0xb1 in libsycl.so.8+0x345751
#3 ProgramManager::getOrCreateURProgram(...)+0x6b in libsycl.so.8+0x3075eb
#4 ProgramManager::getBuiltURProgram::lambda+0x25b in libsycl.so.8+0x30945b
#5 ProgramManager::getBuiltURProgram(...)+0xc2c in libsycl.so.8+0x30b56c
#6 ProgramManager::getBuiltURProgram(...)+0x3af in libsycl.so.8+0x313dbf
#7 ProgramManager::getOrCreateKernel(...)+0x1dc in libsycl.so.8+0x31459c
#8-12 sycl::_V1::detail::queue_impl::submit_kernel_* in libsycl.so.8
#13 ggml_sycl_op_get_rows+0x2714 in libggml-sycl.so
#14 ggml_sycl_get_rows+0x7f in libggml-sycl.so
#15 ggml_backend_sycl_graph_compute+0x767 in libggml-sycl.so
...
%rdi and %rdx (strcmp args) are visible in the register dump as 0x0 and a heap pointer —
i.e., strcmp(NULL, valid_string).
The crash address vmovdqu (%rdi),%ymm0 inside __strcmp_avx2 confirms the first-arg
NULL deref.
Reproduction
Environment:
- libsycl rev
ab3dc98de0fd1ada9df12b138de1e1f8b715cc27 (intel-llvm, 2025-11-14)
- Intel Arc Pro B70 with
intel-compute-runtime 26.09, OpenCL backend
(ONEAPI_DEVICE_SELECTOR=opencl:gpu)
- GGML SYCL backend from llama.cpp@073bb2c20 (
ml/backend/ggml/ggml/src/ggml-sycl/)
Reproducer:
- Build llama.cpp's SYCL backend with
BUILD_SHARED_LIBS=ON so that libggml-sycl.so is a
dynamic library (this is what ollama does).
- dlopen
libggml-sycl.so from a host program (or link a Go runner that
LD_LIBRARY_PATH-loads it).
- Run any inference that triggers
GGML_OP_GET_ROWS via SYCL on first decode (e.g. ollama
running qwen2.5:3b or any model on Battlemage).
- Crash on first kernel JIT, before any token output.
Static-link of the same libggml-sycl (e.g., pkg/llama-cpp-sycl/llama-cli with
BUILD_SHARED_LIBS=false) does not crash — same source, same env, same model, same SYCL
runtime. The dynamic-link path is what produces an RTDeviceBinaryImage with
EntriesBegin->GetName() == nullptr.
Workaround
Set SYCL_CACHE_PERSISTENT=0. This bypasses the getItemFromDisc codepath entirely, so
getSortedImages is never called. Verified working: the same workload that crashed at
first decode now generates correct text.
Suggested fix
Make the sort comparator NULL-safe (one approach):
static std::vector<const RTDeviceBinaryImage *>
getSortedImages(const std::vector<const RTDeviceBinaryImage *> &Imgs) {
std::vector<const RTDeviceBinaryImage *> SortedImgs = Imgs;
std::sort(SortedImgs.begin(), SortedImgs.end(),
[](const RTDeviceBinaryImage *A, const RTDeviceBinaryImage *B) {
const auto *NameA = A->getRawData().EntriesBegin
? A->getRawData().EntriesBegin->GetName()
: nullptr;
const auto *NameB = B->getRawData().EntriesBegin
? B->getRawData().EntriesBegin->GetName()
: nullptr;
if (NameA == NameB) return false; // both NULL or same ptr
if (NameA == nullptr) return true; // NULLs sort first
if (NameB == nullptr) return false;
return std::strcmp(NameA, NameB) < 0;
});
return SortedImgs;
}
Better still: assert at registration time that every RTDeviceBinaryImage registered via
__sycl_register_lib has a non-empty entry table with a named first entry, OR explicitly
filter out images with EntriesBegin == EntriesEnd before passing to
getSortedImages so the cache key is well-defined.
Ideally, also identify why __sycl_register_lib accepts an image with a NULL entry name in
the first place — that's likely the root invariant violation produced by the
-fsycl driver pipeline when generating SPIR-V images for a shared object. The downstream
crash signature is the symptom; the upstream codegen path (which produces the empty/null
entry image inside a .so) is the root cause.
Reproducibility notes
- Crash is 100% reliable with
SYCL_CACHE_PERSISTENT=1 (default).
- Crash does NOT happen with static link, even with the same kernel sources, same flags, and
same SYCL runtime.
- Wiping the on-disk cache directory (
~/.cache/libsycl_cache) does not help — the bug is in
the in-memory vector<RTDeviceBinaryImage*> sort, not in cached files.
- The SYCL runtime first calls
getItemFromDisc (cache lookup) on the image vector before
doing fresh JIT; the lookup needs the sorted vector to construct the cache key. So the
crash hits BEFORE any cache I/O, regardless of cache contents.
NULL-deref in
sycl::detail::getSortedImages—strcmpon entry name without null-checkSummary
sycl::detail::getSortedImages()inpersistent_device_code_cache.cpp:99-107crashes withSIGSEGV(strcmp(NULL, ...)) when one of theRTDeviceBinaryImageentries it sorts has aNULL
EntriesBegin->GetName(). Triggered reliably when an application loads a SYCL kernel viaa dynamically-linked
.soand requests JIT compilation withSYCL_CACHE_PERSISTENT=1(thedefault in many downstreams).
The same code, statically linked, works fine — confirming that the NULL
Nameis producedwhen an
RTDeviceBinaryImageis registered from adlopen'd shared library. The cachesort lambda assumes every image has a non-empty entry table with a named first entry, which
is not invariant.
Affected source
sycl/source/detail/persistent_device_code_cache.cpp(master branch, also atab3dc98de0fd1ada9df12b138de1e1f8b715cc27):The comment "All entry names are unique among these images, so comparing the first ones is
enough" relies on an unverified invariant. Encountered in real workloads where
EntriesBegin->GetName()is NULL.Backtrace (libunwind, from a real crash)
Captured via
LD_PRELOADSIGSEGV handler with libunwind on Intel Arc Pro B70 (BMG-G31,PCI 8086:e223), GGML SYCL backend at first
ggml_sycl_op_get_rowsJIT:%rdiand%rdx(strcmp args) are visible in the register dump as 0x0 and a heap pointer —i.e.,
strcmp(NULL, valid_string).The crash address
vmovdqu (%rdi),%ymm0inside__strcmp_avx2confirms the first-argNULL deref.
Reproduction
Environment:
ab3dc98de0fd1ada9df12b138de1e1f8b715cc27(intel-llvm, 2025-11-14)intel-compute-runtime 26.09, OpenCL backend(
ONEAPI_DEVICE_SELECTOR=opencl:gpu)ml/backend/ggml/ggml/src/ggml-sycl/)Reproducer:
BUILD_SHARED_LIBS=ONso thatlibggml-sycl.sois adynamic library (this is what
ollamadoes).libggml-sycl.sofrom a host program (or link a Go runner thatLD_LIBRARY_PATH-loads it).GGML_OP_GET_ROWSvia SYCL on first decode (e.g. ollamarunning
qwen2.5:3bor any model on Battlemage).Static-link of the same
libggml-sycl(e.g.,pkg/llama-cpp-sycl/llama-cliwithBUILD_SHARED_LIBS=false) does not crash — same source, same env, same model, same SYCLruntime. The dynamic-link path is what produces an
RTDeviceBinaryImagewithEntriesBegin->GetName() == nullptr.Workaround
Set
SYCL_CACHE_PERSISTENT=0. This bypasses thegetItemFromDisccodepath entirely, sogetSortedImagesis never called. Verified working: the same workload that crashed atfirst decode now generates correct text.
Suggested fix
Make the sort comparator NULL-safe (one approach):
Better still: assert at registration time that every
RTDeviceBinaryImageregistered via__sycl_register_libhas a non-empty entry table with a named first entry, OR explicitlyfilter out images with
EntriesBegin == EntriesEndbefore passing togetSortedImagesso the cache key is well-defined.Ideally, also identify why
__sycl_register_libaccepts an image with a NULL entry name inthe first place — that's likely the root invariant violation produced by the
-fsycldriver pipeline when generating SPIR-V images for a shared object. The downstreamcrash signature is the symptom; the upstream codegen path (which produces the empty/null
entry image inside a
.so) is the root cause.Reproducibility notes
SYCL_CACHE_PERSISTENT=1(default).same SYCL runtime.
~/.cache/libsycl_cache) does not help — the bug is inthe in-memory
vector<RTDeviceBinaryImage*>sort, not in cached files.getItemFromDisc(cache lookup) on the image vector beforedoing fresh JIT; the lookup needs the sorted vector to construct the cache key. So the
crash hits BEFORE any cache I/O, regardless of cache contents.