Description
Summary
The KVM platform on ARM64 crashes with SIGSEGV when the sentry's Go runtime calls VDSO functions (specifically __kernel_getrandom) inside the KVM guest. I suspect the crash is caused by a mismatch in ARM Pointer Authentication (PAC) state between Guest EL1 (where paciasp signs the return address) and Host EL0 (where autiasp verifies it after sigreturn). See my following evidence.
The crash is 100% reproducible with workloads that perform ≥100 sequential fork/exec operations, and does not occur on x86 or with the systrap or ptrace platform.
I suspect this is a bug in how gvisor handles PAC (between guest EL1 and host EL0). I patched VDSO PAC to NOP and the crash would go away. I don't think this is the right fix though so I'm opening this PR. See below for the detailed steps on how I worked around it.
Crash chain
-
The sentry's Go runtime at Guest EL1 calls runtime.vgetrandom1,
which invokes the VDSO's __kernel_getrandom function.
-
The VDSO's prologue executes paciasp — this signs the return
address (R30/LR) using the current PAC key and SP as context. However,
gVisor does not enable PAC for the KVM guest (neither
KVM_ARM_VCPU_PTRAUTH_ADDRESS nor KVM_ARM_VCPU_PTRAUTH_GENERIC are
set in vcpuInit.features). The behavior of paciasp at Guest EL1
without PAC enabled depends on HCR_EL2.API:
- If
HCR_API=0: PAC instructions trap to EL2, or are treated as
HINT (NOP) depending on the CPU.
- In either case, R30 is not properly signed with the host's PAC key.
-
The VDSO saves the (incorrectly signed or unsigned) R30 to its stack
frame, then continues executing. Eventually it executes a SVC
instruction for the actual getrandom syscall.
-
The SVC at Guest EL1 triggers El1_sync → KERNEL_ENTRY_FROM_EL1,
which saves the current registers (including the VDSO's PC and SP) into
c.CPU.Registers(). Then KernelSyscall → Halt → KVM exit.
-
bluepillArchExit copies c.CPU.Registers() (containing the VDSO's
PC, SP, and other sentry registers) into the host's UContext64 signal
context. sigreturn restores these registers.
-
The host thread resumes execution at the VDSO's SVC instruction at
Host EL0. The SVC completes as a real host getrandom syscall.
Execution continues through the VDSO epilogue.
-
The VDSO epilogue loads R30 from the stack (ldp x29, x30, [sp], #128)
and executes autiasp — which verifies R30's PAC tag using the
host's PAC key. The tag was computed with the guest's PAC state
(or not computed at all). Verification fails. The CPU raises
an exception, delivered as SIGILL.
-
gVisor's sighandler catches all SIGILLs (it's installed for the
bluepill mechanism). It calls bluepillArchEnter, which reads R8 from
the signal context as a *vCPU pointer. But R8 contains whatever the
VDSO had in R8 at the time of the fault — not a valid vCPU pointer.
Dereferencing it causes SIGSEGV.
Why x86 Is Not Affected
There is no equivalent of VDSO PAC because x86 does not have pointer authentication.
Why It Correlates With Fork/Exec Count
The Go runtime calls runtime.vgetrandom1 for random number generation (goroutine IDs, map hash seeds, stack randomization). More fork/exec → more goroutines → more vgetrandom1 calls → higher probability that a VDSO getrandom SVC is the specific syscall captured in CPU_REGISTERS at VM exit time.
Evidence
1. Crash PC is always VDSO autiasp
Four independent gdb catches (attaching to the sandbox process with a
SIGSEGV catchpoint) all show the SIGILL's faulting PC at the same VDSO
offset:
ctx.PC = 0xe2833773bc3c (VDSO base + 0xc3c)
ctx.PC = 0xe93b93ccdc3c (VDSO base + 0xc3c)
ctx.PC = 0xf33f80d43c3c (VDSO base + 0xc3c)
ctx.PC = 0xfd33596b5c3c (VDSO base + 0xc3c)
VDSO objdump confirms autiasp at offset 0xc3c, inside
__kernel_getrandom:
$ objdump -d /tmp/vdso.so | grep -E "paciasp|autiasp"
a68: d503233f paciasp ← __kernel_getrandom entry
c3c: d50323bf autiasp ← crash here
2. Crash PC matches the sandbox's own VDSO address
Same-run comparison (gdb attached to sandbox process, reading ctx.PC
from UContext64, and /proc/PID/maps from the same process):
VDSO base: 0xefc8da221000 (from /proc/$SANDBOX/maps)
VDSO + 0xc3c: 0xefc8da221c3c
Crash ctx.PC: 0x0000efc8da221c3c ← identical
3. R30 on the VDSO stack has no valid PAC tag
gdb stack dump at crash:
[SP-128+8] = 0x0000000000096c0c ← R30, NO PAC bits
pauth_cmask = 0x007f000000000000 ← PAC bits should be in [55:49]
R30 & mask = 0x0000000000000000 ← all zero = unsigned
4. c.CPU.Registers().Pc contains VDSO SVC address at exit
gdb breakpoint at the instruction in bluepillArchExit that reads
c.CPU.Registers().Pc:
Pc=0x96a8c R8=0x62 ← runtime.futex SVC (normal, safe)
Pc=0x15b7c R8=0xe9 ← runtime.mmap SVC (normal, safe)
Pc=0xf7fe34e7ddc4 R8=0x116 ← VDSO getrandom SVC (CRASH follows)
5. runsc has zero PAC instructions; only VDSO uses PAC
$ objdump -d /usr/local/bin/runsc | grep -c paciasp
0
$ objdump -d /tmp/vdso.so | grep -c paciasp
3
6. gVisor does not enable PAC for the KVM guest
// machine_arm64_unsafe.go:80
vcpuInit.features[0] |= (1 << _KVM_ARM_VCPU_PSCI_0_2)
// No KVM_ARM_VCPU_PTRAUTH_ADDRESS (5) or KVM_ARM_VCPU_PTRAUTH_GENERIC (6)
How I worked around it
At KVM platform init, mprotect the VDSO writable, replace all paciasp/autiasp with NOP, restore to read+exec. This disables PAC for VDSO functions only, within the sandbox process.
Some potential fixes (take a grain of salt)
- Use the same approach as x86 (synthetic signal frame)
- Properly synchronize PAC keys between host and guest
Steps to reproduce
100% crash rate on ARM64 with PAC-capable CPU:
runsc --platform=kvm do /bin/bash -c \
"for i in \$(seq 1 500); do /bin/echo x > /dev/null; done; echo DONE"
Exit code: 139 (SIGSEGV)
0% crash rate on same hardware with systrap:
runsc --platform=systrap do /bin/bash -c \
"for i in \$(seq 1 500); do /bin/echo x > /dev/null; done; echo DONE"
Exit code: 0
runsc version
Environment: NVIDIA Grace (ARMv9)
kernel 6.17.0-14-generic (4K pages)
runsc release-20260223.0.
CPU features include `paca pacg` (PAC).
docker version (if using docker)
uname
No response
kubectl (if using Kubernetes)
repo state (if built from source)
No response
runsc debug logs (if available)
Description
Summary
The KVM platform on ARM64 crashes with SIGSEGV when the sentry's Go runtime calls VDSO functions (specifically
__kernel_getrandom) inside the KVM guest. I suspect the crash is caused by a mismatch in ARM Pointer Authentication (PAC) state between Guest EL1 (wherepaciaspsigns the return address) and Host EL0 (whereautiaspverifies it aftersigreturn). See my following evidence.The crash is 100% reproducible with workloads that perform ≥100 sequential fork/exec operations, and does not occur on x86 or with the systrap or ptrace platform.
I suspect this is a bug in how gvisor handles PAC (between guest EL1 and host EL0). I patched VDSO PAC to NOP and the crash would go away. I don't think this is the right fix though so I'm opening this PR. See below for the detailed steps on how I worked around it.
Crash chain
The sentry's Go runtime at Guest EL1 calls
runtime.vgetrandom1,which invokes the VDSO's
__kernel_getrandomfunction.The VDSO's prologue executes
paciasp— this signs the returnaddress (R30/LR) using the current PAC key and SP as context. However,
gVisor does not enable PAC for the KVM guest (neither
KVM_ARM_VCPU_PTRAUTH_ADDRESSnorKVM_ARM_VCPU_PTRAUTH_GENERICareset in
vcpuInit.features). The behavior ofpaciaspat Guest EL1without PAC enabled depends on
HCR_EL2.API:HCR_API=0: PAC instructions trap to EL2, or are treated asHINT(NOP) depending on the CPU.The VDSO saves the (incorrectly signed or unsigned) R30 to its stack
frame, then continues executing. Eventually it executes a
SVCinstruction for the actual
getrandomsyscall.The SVC at Guest EL1 triggers
El1_sync→KERNEL_ENTRY_FROM_EL1,which saves the current registers (including the VDSO's PC and SP) into
c.CPU.Registers(). ThenKernelSyscall→Halt→ KVM exit.bluepillArchExitcopiesc.CPU.Registers()(containing the VDSO'sPC, SP, and other sentry registers) into the host's UContext64 signal
context.
sigreturnrestores these registers.The host thread resumes execution at the VDSO's SVC instruction at
Host EL0. The SVC completes as a real host
getrandomsyscall.Execution continues through the VDSO epilogue.
The VDSO epilogue loads R30 from the stack (
ldp x29, x30, [sp], #128)and executes
autiasp— which verifies R30's PAC tag using thehost's PAC key. The tag was computed with the guest's PAC state
(or not computed at all). Verification fails. The CPU raises
an exception, delivered as SIGILL.
gVisor's
sighandlercatches all SIGILLs (it's installed for thebluepillmechanism). It callsbluepillArchEnter, which reads R8 fromthe signal context as a
*vCPUpointer. But R8 contains whatever theVDSO had in R8 at the time of the fault — not a valid vCPU pointer.
Dereferencing it causes SIGSEGV.
Why x86 Is Not Affected
There is no equivalent of VDSO PAC because x86 does not have pointer authentication.
Why It Correlates With Fork/Exec Count
The Go runtime calls
runtime.vgetrandom1for random number generation (goroutine IDs, map hash seeds, stack randomization). More fork/exec → more goroutines → morevgetrandom1calls → higher probability that a VDSO getrandom SVC is the specific syscall captured inCPU_REGISTERSat VM exit time.Evidence
1. Crash PC is always VDSO
autiaspFour independent gdb catches (attaching to the sandbox process with a
SIGSEGV catchpoint) all show the SIGILL's faulting PC at the same VDSO
offset:
VDSO objdump confirms
autiaspat offset 0xc3c, inside__kernel_getrandom:2. Crash PC matches the sandbox's own VDSO address
Same-run comparison (gdb attached to sandbox process, reading ctx.PC
from UContext64, and /proc/PID/maps from the same process):
3. R30 on the VDSO stack has no valid PAC tag
4.
c.CPU.Registers().Pccontains VDSO SVC address at exitgdb breakpoint at the instruction in
bluepillArchExitthat readsc.CPU.Registers().Pc:5. runsc has zero PAC instructions; only VDSO uses PAC
6. gVisor does not enable PAC for the KVM guest
How I worked around it
At KVM platform init,
mprotectthe VDSO writable, replace allpaciasp/autiaspwith NOP, restore to read+exec. This disables PAC for VDSO functions only, within the sandbox process.Some potential fixes (take a grain of salt)
Steps to reproduce
100% crash rate on ARM64 with PAC-capable CPU:
Exit code: 139 (SIGSEGV)
0% crash rate on same hardware with systrap:
Exit code: 0
runsc version
Environment: NVIDIA Grace (ARMv9) kernel 6.17.0-14-generic (4K pages) runsc release-20260223.0. CPU features include `paca pacg` (PAC).docker version (if using docker)
uname
No response
kubectl (if using Kubernetes)
repo state (if built from source)
No response
runsc debug logs (if available)