add: expose public Unwinder::capture API#396
Conversation
Adds `dial9_perf_self_profile::unwinder::Unwinder`, a zero-sized handle that proves the SIGSEGV fault handler is installed and exposes a non-allocating `capture()` for collecting the calling thread's stack. This is step 1 of the memory-profiling rollout (see dial9-tokio-telemetry/design/memory-profiling.md §2 and §14): the allocator hook and any future self-profiling caller needs a stable public entry point into the frame-pointer unwinder, which was previously reachable only through `pub(crate)` plumbing inside the CPU profiler. Implementation: - `Unwinder::install()` wires to the existing `fp_profiler::install_handler()` (idempotent, thread-safe). Returns an `io::Error` on non-Linux or non-x86_64/aarch64 targets. - `Unwinder::capture(&self, out)` reads the caller's return address and frame pointer via inline asm, then delegates to the existing frame- pointer walker. Frame 0 is the PC `capture` returns to — i.e. inside the caller — per the design's frame-0 contract. Does not modify `MAX_FRAME_SIZE` or any existing unwinder internals. Raising the cap to 1 MiB (mentioned in the design) is deferred to a dedicated commit with a false-frame-rate measurement. Tests cover idempotent install (single thread and 8-way concurrent), frame-0 contract (frame 0 is not inside `capture` itself), and output- buffer limits.
Tightens the safety story around `Unwinder::capture`, which was recently exposed in d0b0380. Four related changes: 1. `capture` is now `unsafe fn`. Its safety invariants (SIGSEGV handler must be installed and still active, not called from inside a SIGSEGV handler, frame pointers enabled, not in a prologue/epilogue window) involve process-global state and compile flags that the type system cannot enforce, so an `unsafe fn` with a documented `# Safety` section is the honest signature. 2. Added `Unwinder::verify_handler()` for runtime verification that the SIGSEGV handler we registered in `install()` is still the active handler for the process. A third-party library can replace it after our install without chaining; in that case a stack-walk fault would crash the process instead of being caught. `verify_handler` costs one `sigaction` syscall so it is not suitable for the hot path; `capture` now also contains a `debug_assert!(self.verify_handler())` so dev/test builds fail loudly if the invariant is broken, at zero cost in release. 3. `capture` now returns `CaptureResult { frames_written, truncated }` (`#[non_exhaustive]`) instead of a bare `usize`. Previously, a caller had no way to distinguish "stack happened to fit exactly in the buffer" from "buffer filled and outer frames were dropped" — both return `out.len()`. `unwind` now validates the next frame before the capacity check and reports `truncated = true` only when another valid frame would have fit. `unwind_from_ucontext` is updated similarly; its only caller (`ctimer_sampler`) ignores the new bool. 4. Collapsed the double `#[inline(never)]` layer (`Unwinder::capture` -> `platform::capture`). The prior design was not only redundant but broke the frame-0 contract: `read_registers` was `inline(always)` and got inlined into `platform::capture`, so `rbp` observed `platform::capture`'s frame, making `*(fp+8)` point into `Unwinder::capture`'s body rather than the user's caller. The existing `assert_ne!(out[0], capture_addr)` test only checked against the function's entry point and silently passed for any address further into the body. Fix: single `#[inline(never)]` on `Unwinder::capture`; `platform::capture` and `read_caller_regs` become `#[inline(always)]` so they collapse into it. Now `rbp` sees `Unwinder::capture`'s own frame, `*(fp+8)` is capture's return address into the user's caller, and that address is what goes into `out[0]`. `read_caller_regs` is also now `unsafe fn` with a detailed `# Safety` section explaining the inline-always requirement, the frame-pointer ABI invariant the raw reads depend on, and why those reads are NOT covered by the SIGSEGV `safe_load` shim. TDD notes: - New `frame_zero_points_into_caller_of_capture` tightens the frame-0 assertion to bound `out[0]` inside `helper`'s body. Verified to FAIL against the old double-`inline(never)` design and PASS with the fix. - New `capture_debug_asserts_when_handler_replaced` replaces the SIGSEGV handler with `SIG_IGN`, calls `capture`, and asserts a panic via `catch_unwind`. Verified to FAIL without the `debug_assert!` and PASS with it. Nextest runs each test in its own process so the handler manipulation is contained. - New tests for truncation semantics, empty-buffer truncation, and `verify_handler_true_after_install`. Verification: - `cargo fmt --check` clean - `cargo clippy --all-targets --all-features --workspace` clean - `cargo nextest run -p dial9-perf-self-profile --stress-duration 20s`: 36 iterations x 81 tests, 0 failures. No production callers of `Unwinder::capture` exist yet, so the signature change (usize -> CaptureResult, safe -> unsafe) is a free API refinement on top of the initial exposure in d0b0380.
- unwinder.rs: Move per-test #[cfg(all(target_os, target_arch))] to a nested `mod linux` inside the test module - Add tests/symbolize_stack.rs: integration test that captures a stack trace via Unwinder and symbolizes it, verifying expected function names
- Move install_is_idempotent tests into cfg-gated linux module (fixes macOS) - Add install_or_skip() helper that skips capture tests when the handler is replaced by sanitizers or seccomp (fixes ASAN + ECS simulation) - Guard symbolize_stack integration test with verify_handler check
The test's RestoreGuard previously reset SIGSEGV to SIG_DFL on drop. Under `cargo test`'s threaded harness (no process isolation), this could leave concurrent tests without the dial9 fault handler. Now we save the old sigaction when replacing and restore it on drop, making the test safe regardless of test runner.
In unoptimized debug builds (as used by the ECS simulation CI target), nested #[inline(never)] functions can be laid out 100+ KiB from their symbol address. The previous 64 KiB window was too small.
| /// # Safety | ||
| /// `ucontext` must be the pointer the kernel passed to a SA_SIGINFO handler. | ||
| pub unsafe fn unwind_from_ucontext(ucontext: *mut libc::c_void, out: &mut [u64]) -> usize { | ||
| pub unsafe fn unwind_from_ucontext(ucontext: *mut libc::c_void, out: &mut [u64]) -> (usize, bool) { |
There was a problem hiding this comment.
complete NIT, but a bit hard to remember what a (usize, bool) means without reading docs, might be worth creating a type?
There was a problem hiding this comment.
related nit, can we call this pub(crate) if it is not re-exported? It's a bit hard to tell currently which is exported, I see that capture() is currently.
| //! Because of this, the unwinder must be "[`install`ed](Unwinder::install)" before | ||
| //! you can use it to [`capture`](Unwinder::capture) a stack. | ||
| //! | ||
| //! The unwinded stacks are only addresses. You must use a symbolizer separately to |
| /// # Safety | ||
| /// Same obligations as [`Unwinder::capture`]: handler installed, | ||
| /// not inside a SIGSEGV handler, frame pointers enabled. | ||
| #[inline(always)] |
There was a problem hiding this comment.
This (and read_caller_regs()) are a bit fragile since it is only a (strong) hint to inline it...
We could perhaps do something more resilient with a capture!() macro used by the caller that does an asm!() read? Though ergonomics would be pretty gross, likely a bad idea.
I see we do have a runtime test with frame_zero_points_into_caller_of_capture which helps for our own usage, though it doesn't extend to callers that might have totally different optimization settings, LTO, smaller codegen units, etc. I wonder if there is any sort of smoke test that we can offer them too. I guess the problem then is, whatever test is run, is still being run in debug mode with different compilation settings?
This is probably pedantics since these functions are so small that rustc is unlikely to skip inlining, probably ignore me.
There was a problem hiding this comment.
I think this is probably not worth it. The cost of it being not inlined is just an extra stack frame so I'm not worried
| /// - Should generally be called from a signal handler where the target thread | ||
| /// is stopped; walking a running thread's stack races with mutations. | ||
| pub unsafe fn unwind(pc: usize, mut fp: usize, sp: usize, out: &mut [u64]) -> usize { | ||
| pub unsafe fn unwind(pc: usize, mut fp: usize, sp: usize, out: &mut [u64]) -> (usize, bool) { |
There was a problem hiding this comment.
I'd suggest a named struct return here as well
| /// # Safety | ||
| /// `ucontext` must be the pointer the kernel passed to a SA_SIGINFO handler. | ||
| pub unsafe fn unwind_from_ucontext(ucontext: *mut libc::c_void, out: &mut [u64]) -> usize { | ||
| pub unsafe fn unwind_from_ucontext(ucontext: *mut libc::c_void, out: &mut [u64]) -> (usize, bool) { |
There was a problem hiding this comment.
related nit, can we call this pub(crate) if it is not re-exported? It's a bit hard to tell currently which is exported, I see that capture() is currently.
| /// | ||
| /// Performs one `sigaction` syscall. Not suitable for per-sample hot | ||
| /// paths. | ||
| pub fn verify_handler(&self) -> bool { |
There was a problem hiding this comment.
might be worth a comment that this requires &self to ensure that Unwinder::install() has been called, even though it doesn't require any &self access internally
This series of changes exposes a public framepointer unwinding API which will be used from #362