Skip to content

add: expose public Unwinder::capture API#396

Merged
rcoh merged 11 commits into
mainfrom
feat/mem-profiling-1
May 14, 2026
Merged

add: expose public Unwinder::capture API#396
rcoh merged 11 commits into
mainfrom
feat/mem-profiling-1

Conversation

@rcoh
Copy link
Copy Markdown
Contributor

@rcoh rcoh commented May 12, 2026

This series of changes exposes a public framepointer unwinding API which will be used from #362

rcoh added 9 commits May 12, 2026 15:23
Adds `dial9_perf_self_profile::unwinder::Unwinder`, a zero-sized handle
that proves the SIGSEGV fault handler is installed and exposes a
non-allocating `capture()` for collecting the calling thread's stack.

This is step 1 of the memory-profiling rollout (see
dial9-tokio-telemetry/design/memory-profiling.md §2 and §14): the
allocator hook and any future self-profiling caller needs a stable
public entry point into the frame-pointer unwinder, which was previously
reachable only through `pub(crate)` plumbing inside the CPU profiler.

Implementation:
- `Unwinder::install()` wires to the existing `fp_profiler::install_handler()`
  (idempotent, thread-safe). Returns an `io::Error` on non-Linux or
  non-x86_64/aarch64 targets.
- `Unwinder::capture(&self, out)` reads the caller's return address and
  frame pointer via inline asm, then delegates to the existing frame-
  pointer walker. Frame 0 is the PC `capture` returns to — i.e. inside
  the caller — per the design's frame-0 contract.

Does not modify `MAX_FRAME_SIZE` or any existing unwinder internals.
Raising the cap to 1 MiB (mentioned in the design) is deferred to a
dedicated commit with a false-frame-rate measurement.

Tests cover idempotent install (single thread and 8-way concurrent),
frame-0 contract (frame 0 is not inside `capture` itself), and output-
buffer limits.
Tightens the safety story around `Unwinder::capture`, which was recently
exposed in d0b0380. Four related changes:

1. `capture` is now `unsafe fn`. Its safety invariants (SIGSEGV handler
   must be installed and still active, not called from inside a SIGSEGV
   handler, frame pointers enabled, not in a prologue/epilogue window)
   involve process-global state and compile flags that the type system
   cannot enforce, so an `unsafe fn` with a documented `# Safety`
   section is the honest signature.

2. Added `Unwinder::verify_handler()` for runtime verification that the
   SIGSEGV handler we registered in `install()` is still the active
   handler for the process. A third-party library can replace it after
   our install without chaining; in that case a stack-walk fault would
   crash the process instead of being caught. `verify_handler` costs
   one `sigaction` syscall so it is not suitable for the hot path;
   `capture` now also contains a `debug_assert!(self.verify_handler())`
   so dev/test builds fail loudly if the invariant is broken, at zero
   cost in release.

3. `capture` now returns `CaptureResult { frames_written, truncated }`
   (`#[non_exhaustive]`) instead of a bare `usize`. Previously, a caller
   had no way to distinguish "stack happened to fit exactly in the
   buffer" from "buffer filled and outer frames were dropped" — both
   return `out.len()`. `unwind` now validates the next frame before the
   capacity check and reports `truncated = true` only when another
   valid frame would have fit. `unwind_from_ucontext` is updated
   similarly; its only caller (`ctimer_sampler`) ignores the new bool.

4. Collapsed the double `#[inline(never)]` layer
   (`Unwinder::capture` -> `platform::capture`). The prior design was
   not only redundant but broke the frame-0 contract: `read_registers`
   was `inline(always)` and got inlined into `platform::capture`, so
   `rbp` observed `platform::capture`'s frame, making `*(fp+8)` point
   into `Unwinder::capture`'s body rather than the user's caller. The
   existing `assert_ne!(out[0], capture_addr)` test only checked
   against the function's entry point and silently passed for any
   address further into the body.

   Fix: single `#[inline(never)]` on `Unwinder::capture`;
   `platform::capture` and `read_caller_regs` become `#[inline(always)]`
   so they collapse into it. Now `rbp` sees `Unwinder::capture`'s own
   frame, `*(fp+8)` is capture's return address into the user's caller,
   and that address is what goes into `out[0]`.

   `read_caller_regs` is also now `unsafe fn` with a detailed
   `# Safety` section explaining the inline-always requirement, the
   frame-pointer ABI invariant the raw reads depend on, and why those
   reads are NOT covered by the SIGSEGV `safe_load` shim.

TDD notes:
- New `frame_zero_points_into_caller_of_capture` tightens the frame-0
  assertion to bound `out[0]` inside `helper`'s body. Verified to FAIL
  against the old double-`inline(never)` design and PASS with the fix.
- New `capture_debug_asserts_when_handler_replaced` replaces the
  SIGSEGV handler with `SIG_IGN`, calls `capture`, and asserts a
  panic via `catch_unwind`. Verified to FAIL without the
  `debug_assert!` and PASS with it. Nextest runs each test in its own
  process so the handler manipulation is contained.
- New tests for truncation semantics, empty-buffer truncation,
  and `verify_handler_true_after_install`.

Verification:
- `cargo fmt --check` clean
- `cargo clippy --all-targets --all-features --workspace` clean
- `cargo nextest run -p dial9-perf-self-profile --stress-duration 20s`:
  36 iterations x 81 tests, 0 failures.

No production callers of `Unwinder::capture` exist yet, so the signature
change (usize -> CaptureResult, safe -> unsafe) is a free API
refinement on top of the initial exposure in d0b0380.
- unwinder.rs: Move per-test #[cfg(all(target_os, target_arch))] to a
  nested `mod linux` inside the test module
- Add tests/symbolize_stack.rs: integration test that captures a stack
  trace via Unwinder and symbolizes it, verifying expected function names
- Move install_is_idempotent tests into cfg-gated linux module (fixes macOS)
- Add install_or_skip() helper that skips capture tests when the handler
  is replaced by sanitizers or seccomp (fixes ASAN + ECS simulation)
- Guard symbolize_stack integration test with verify_handler check
The test's RestoreGuard previously reset SIGSEGV to SIG_DFL on drop.
Under `cargo test`'s threaded harness (no process isolation), this
could leave concurrent tests without the dial9 fault handler.

Now we save the old sigaction when replacing and restore it on drop,
making the test safe regardless of test runner.
In unoptimized debug builds (as used by the ECS simulation CI target),
nested #[inline(never)] functions can be laid out 100+ KiB from their
symbol address. The previous 64 KiB window was too small.
@rcoh rcoh marked this pull request as ready for review May 13, 2026 01:02
@rcoh rcoh requested a review from jlizen May 13, 2026 01:02
@rcoh rcoh changed the title perf-self-profile: harden public Unwinder::capture API add: expose public Unwinder::capture API May 13, 2026
@rcoh rcoh requested a review from yulnr May 13, 2026 15:01
/// # Safety
/// `ucontext` must be the pointer the kernel passed to a SA_SIGINFO handler.
pub unsafe fn unwind_from_ucontext(ucontext: *mut libc::c_void, out: &mut [u64]) -> usize {
pub unsafe fn unwind_from_ucontext(ucontext: *mut libc::c_void, out: &mut [u64]) -> (usize, bool) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

complete NIT, but a bit hard to remember what a (usize, bool) means without reading docs, might be worth creating a type?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related nit, can we call this pub(crate) if it is not re-exported? It's a bit hard to tell currently which is exported, I see that capture() is currently.

Comment thread perf-self-profile/src/unwinder.rs Outdated
//! Because of this, the unwinder must be "[`install`ed](Unwinder::install)" before
//! you can use it to [`capture`](Unwinder::capture) a stack.
//!
//! The unwinded stacks are only addresses. You must use a symbolizer separately to
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unwound

/// # Safety
/// Same obligations as [`Unwinder::capture`]: handler installed,
/// not inside a SIGSEGV handler, frame pointers enabled.
#[inline(always)]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and read_caller_regs()) are a bit fragile since it is only a (strong) hint to inline it...
We could perhaps do something more resilient with a capture!() macro used by the caller that does an asm!() read? Though ergonomics would be pretty gross, likely a bad idea.

I see we do have a runtime test with frame_zero_points_into_caller_of_capture which helps for our own usage, though it doesn't extend to callers that might have totally different optimization settings, LTO, smaller codegen units, etc. I wonder if there is any sort of smoke test that we can offer them too. I guess the problem then is, whatever test is run, is still being run in debug mode with different compilation settings?

This is probably pedantics since these functions are so small that rustc is unlikely to skip inlining, probably ignore me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is probably not worth it. The cost of it being not inlined is just an extra stack frame so I'm not worried

/// - Should generally be called from a signal handler where the target thread
/// is stopped; walking a running thread's stack races with mutations.
pub unsafe fn unwind(pc: usize, mut fp: usize, sp: usize, out: &mut [u64]) -> usize {
pub unsafe fn unwind(pc: usize, mut fp: usize, sp: usize, out: &mut [u64]) -> (usize, bool) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest a named struct return here as well

/// # Safety
/// `ucontext` must be the pointer the kernel passed to a SA_SIGINFO handler.
pub unsafe fn unwind_from_ucontext(ucontext: *mut libc::c_void, out: &mut [u64]) -> usize {
pub unsafe fn unwind_from_ucontext(ucontext: *mut libc::c_void, out: &mut [u64]) -> (usize, bool) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related nit, can we call this pub(crate) if it is not re-exported? It's a bit hard to tell currently which is exported, I see that capture() is currently.

///
/// Performs one `sigaction` syscall. Not suitable for per-sample hot
/// paths.
pub fn verify_handler(&self) -> bool {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth a comment that this requires &self to ensure that Unwinder::install() has been called, even though it doesn't require any &self access internally

@rcoh rcoh enabled auto-merge May 14, 2026 18:48
@rcoh rcoh added this pull request to the merge queue May 14, 2026
Merged via the queue into main with commit 23bc4d2 May 14, 2026
20 checks passed
@github-actions github-actions Bot mentioned this pull request May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants