Symptoms
After upgrading to the GCC 14 (2025.02/10 glibc) toolchain, the haveged package (and potentially other high-frequency
syscall users) exhibits random Segmentation Faults during system startup in multi-core QEMU and hardware environments.
- Environment: RV64GC, Linux Kernel 6.1.y, GCC 14.
- Occurrence: Probabilistic (mostly on multi-core), while RV32 or GCC 13 environments remain stable.
Debugging & Trace
Core dump analysis points to the crash occurring within the __vdso_clock_gettime call chain. Further disassembly
revealed the specific failure point at the inlined cpu_relax() function within the vDSO context.
Investigation of the memory state at the time of the crash showed that the riscv_isa_ext_keys array (used by static
branches) was zeroed out. This suggests that the user-space execution of vDSO is attempting to access kernel-space jump
label structures that are either uninitialized, inaccessible, or inconsistent in the user-context mapping.
Root Cause Analysis
The root cause is the use of static_branch_likely() inside cpu_relax() in arch/riscv/include/asm/vdso/processor.h.
RISC-V's cpu_relax() implementation traditionally uses a static branch to dynamically choose between the pause
instruction (Zihintpause extension) and a div instruction stall. However, because the vDSO is mapped into user space,
using kernel-patchable jump labels within vDSO-inlined functions is architecturally unsafe.
Proposed Solution
The fix involves removing the static_branch dependency from the vDSO version of processor.h and replacing it with
compile-time preprocessor checks. This ensures that the vDSO remains robust and independent of runtime kernel patching
mechanisms.
Proposed Changes, see 5edff5c commit in linux sdk, and bugfix in linux Nuclei-Software/linux@d4357ee
Impact & Verification
Eliminates random crashes in haveged and other userspace tools calling vDSO functions.
Verified on Nuclei Linux SDK dev_nuclei_6.1_v3 branch.
Symptoms
After upgrading to the GCC 14 (2025.02/10 glibc) toolchain, the haveged package (and potentially other high-frequency
syscall users) exhibits random Segmentation Faults during system startup in multi-core QEMU and hardware environments.
Debugging & Trace
Core dump analysis points to the crash occurring within the
__vdso_clock_gettimecall chain. Further disassemblyrevealed the specific failure point at the inlined
cpu_relax()function within the vDSO context.Investigation of the memory state at the time of the crash showed that the
riscv_isa_ext_keysarray (used by staticbranches) was zeroed out. This suggests that the user-space execution of vDSO is attempting to access kernel-space jump
label structures that are either uninitialized, inaccessible, or inconsistent in the user-context mapping.
Root Cause Analysis
The root cause is the use of
static_branch_likely()insidecpu_relax()inarch/riscv/include/asm/vdso/processor.h.RISC-V's
cpu_relax()implementation traditionally uses a static branch to dynamically choose between the pauseinstruction (Zihintpause extension) and a div instruction stall. However, because the vDSO is mapped into user space,
using kernel-patchable jump labels within vDSO-inlined functions is architecturally unsafe.
Proposed Solution
The fix involves removing the static_branch dependency from the vDSO version of
processor.hand replacing it withcompile-time preprocessor checks. This ensures that the vDSO remains robust and independent of runtime kernel patching
mechanisms.
Proposed Changes, see 5edff5c commit in linux sdk, and bugfix in linux Nuclei-Software/linux@d4357ee
Impact & Verification
Eliminates random crashes in haveged and other userspace tools calling vDSO functions.
Verified on Nuclei Linux SDK dev_nuclei_6.1_v3 branch.