ui-smoke: only use a readable core from this run for the native backtrace

grandixximo · grandixximo · commit ed7f125dab23 · 2026-06-17T07:54:57.000+08:00
Testing the crash path (segfault injected into a GUI) surfaced two
things in the crash dump helper. First, it globbed /tmp/core* and picked
up a stale, root-owned core from an unrelated run, so gdb printed only
'Permission denied'. Second, the non-root case (CI, and local runtests
-u) never produces a core at all: we will not sudo to point
kernel.core_pattern at a writable dir, so nothing lands.

Restrict the core search to a core the kernel wrote into our own fresh
CORE_DIR, or a relative 'core' in the cwd that postdates arming, and
require it to be readable. When there is no such core, say so and point
at the Python faulthandler traceback in linuxcnc.err, which names the
crash site and is the reliable signal in every environment. The native
backtrace stays a best-effort extra for the root case.

Verified: an injected GUI segfault now fails the test in ~20s (no hang),
logs the Python traceback, and prints a clear 'no readable core dump'
note instead of a misleading permission error.
diff --git a/tests/ui-smoke/_lib/crashdump.sh b/tests/ui-smoke/_lib/crashdump.sh
@@ -1,11 +1,15 @@
 #!/bin/bash
-# Native crash capture for the UI smoke launchers. A GUI segfault is the
-# failure these tests most need to explain, and it lands in C/C++ (Qt,
-# dbus, GL) where PYTHONFAULTHANDLER stops at the event-loop frame. Arm a
-# core dump before launch; after the run, if the GUI left a core, print a
-# native backtrace into the log so CI shows the faulting frame directly.
-# Source with LIB_DIR set; runs only on the failure path, so green runs
-# pay nothing.
+# Native crash capture for the UI smoke launchers. A GUI segfault lands in
+# C/C++ (Qt, dbus, GL); PYTHONFAULTHANDLER (set in launch-env.sh) prints a
+# Python traceback to linuxcnc.err naming the frame that called in, which
+# is the reliable, environment-independent crash signal and is surfaced in
+# every failure log. This helper adds a best-effort native backtrace on
+# top: arm a core dump before launch, and after the run, if a readable
+# core from this run is present, gdb-print its backtrace. The core only
+# materialises when we can point kernel.core_pattern at a writable dir,
+# which needs root; non-root runs (CI, local -u) keep the Python traceback
+# and skip the native one. Source with LIB_DIR set; runs only on the
+# failure path, so green runs pay nothing.
 
 crashdump_arm() {
     CORE_DIR="$(mktemp -d -t ui-smoke-cores.XXXXXX)"
@@ -22,24 +26,34 @@ crashdump_arm() {
 
 crashdump_report() {
     [ -n "${CORE_DIR:-}" ] || return 0
-    local core
-    # shellcheck disable=SC2012  # mktemp dir, no odd filenames
-    core=$(ls -t "$CORE_DIR"/core* ./core* /tmp/core* 2>/dev/null | head -1)
-    if [ -n "$core" ]; then
+    local c core=""
+    # Only trust a core we know is from this run and can actually read:
+    # one the kernel wrote into our fresh CORE_DIR (root path, where we set
+    # core_pattern), or a relative "core" in the cwd that postdates arming.
+    # A broad /tmp glob would pick up a stale or foreign core (often root-
+    # owned), and gdb would just print "Permission denied".
+    for c in "$CORE_DIR"/core*; do
+        [ -e "$c" ] && [ -r "$c" ] && { core="$c"; break; }
+    done
+    if [ -z "$core" ]; then
+        for c in ./core*; do
+            [ -e "$c" ] && [ -r "$c" ] && [ "$c" -nt "$CORE_DIR" ] && { core="$c"; break; }
+        done
+    fi
+    if [ -n "$core" ] && command -v gdb >/dev/null 2>&1; then
         echo "=== crash: native backtrace ($core) ==="
-        # gdb is expected to be installed by .github/scripts/install-deps.sh
-        # on CI and by the developer locally; the suite does not apt-get.
-        if command -v gdb >/dev/null 2>&1; then
-            # "bt" first: gdb auto-selects the faulting thread on a SIGSEGV
-            # core. "thread apply all bt" after gives the rest.
-            gdb -batch -nx \
-                -ex "bt" \
-                -ex "echo \n=== all threads ===\n" \
-                -ex "thread apply all bt" \
-                "$(command -v python3)" "$core" 2>&1 | head -400
-        else
-            echo "(gdb unavailable; core left at $core)"
-        fi
+        # "bt" first: gdb auto-selects the faulting thread on a SIGSEGV
+        # core. "thread apply all bt" after gives the rest.
+        gdb -batch -nx \
+            -ex "bt" \
+            -ex "echo \n=== all threads ===\n" \
+            -ex "thread apply all bt" \
+            "$(command -v python3)" "$core" 2>&1 | head -400
+    else
+        # No readable core (the common non-root case). The Python
+        # faulthandler traceback in linuxcnc.err already names the crash
+        # site; the native backtrace is only a best-effort extra.
+        echo "=== crash: no readable core dump; see the Python traceback in linuxcnc.err above ==="
     fi
     rm -rf "$CORE_DIR"
 }