Skip to content

Add task dump capture behind taskdump feature#354

Merged
rcoh merged 3 commits into
mainfrom
taskdump-capture
May 6, 2026
Merged

Add task dump capture behind taskdump feature#354
rcoh merged 3 commits into
mainfrom
taskdump-capture

Conversation

@rcoh
Copy link
Copy Markdown
Contributor

@rcoh rcoh commented May 5, 2026

Summary

Adds a TaskDumped<F> wrapper gated by the new taskdump cargo feature that
captures async backtraces at yield points for tasks idle longer than a
configurable threshold. Emits TaskDumpEvent into the trace stream.

Stacking is Traced<TaskDumped<F>>TaskDumped is a separate wrapper so the
capture code #[cfg]s out cleanly when the feature is off, and so it can be
removed from the poll hot path once by a single atomic check.

Configured via TracedRuntimeBuilder::with_task_dumps(TaskDumpConfig) (or
TelemetryCoreBuilder::task_dump_config). Capture short-circuits when
telemetry is disabled on the guard, so trace_with doesn't run on a paused
session. Bumps tokio to 1.52 for the taskdump feature.

Part 1 of splitting #213. Not included yet: viewer JS changes, overhead
bench mode, metrics-service demo wiring, custom libunwind FFI.

This change does not include README or docs updates. The PR that adds UI support will include these updates.

Test plan

  • cargo test --features taskdump,analysis — 3 new integration tests in tests/task_dump.rs, plus round-trip test in buffer.rs
  • cargo clippy --all-features --tests
  • cargo fmt --all
  • Verified capture produces no extra PollStart/PollEnd/WakeEvent vs baseline
  • CI green

@rcoh rcoh force-pushed the taskdump-capture branch 2 times, most recently from adb6293 to cd5580b Compare May 5, 2026 01:27
Wraps spawned futures in `TaskDumped<F>` when the `taskdump` feature is
enabled. On each poll, if the previous idle gap exceeded the configured
threshold, the frames captured at the last yield point are emitted as
`TaskDumpEvent`. Capture itself runs inside `tokio::runtime::dump::trace_with`
with a noop waker on a diagnostic re-poll, so it doesn't produce duplicate
wake or poll events.

Configured via `TracedRuntimeBuilder::with_task_dumps(TaskDumpConfig)` or
`TelemetryCoreBuilder::task_dump_config`. Capture short-circuits when the
guard is disabled, so a paused guard skips `trace_with` entirely.

Bumps tokio to 1.52 for the `taskdump` feature.
@rcoh rcoh force-pushed the taskdump-capture branch from cd5580b to 11022f8 Compare May 5, 2026 17:27
Copy link
Copy Markdown
Member

@jlizen jlizen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some small cleanups, couple questions

Comment thread .github/actions/rust-build/action.yml Outdated
cargo test --all-targets --all-features
else
cargo test --all-targets \
--features analysis,cpu-profiling,tracing-layer,worker-s3
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we stay future proof with:
cargo hack test --each-feature --exclude-features taskdump

/// longer than the configured threshold. Requires the `taskdump` crate
/// feature to actually record events; the builder accepts the config
/// unconditionally so the API surface is stable.
pub fn with_task_dumps(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the builder accepts the config
/// unconditionally so the API surface is stable

Seems like an impl detail

trace_path: Option<PathBuf>,
/// Capture async backtraces at yield points. Requires the `taskdump`
/// crate feature to actually record events.
task_dump_config: Option<crate::telemetry::task_dump_config::TaskDumpConfig>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason not to just conditional config the field out for non-taskdump-enabled?

Copy link
Copy Markdown
Contributor Author

@rcoh rcoh May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, only the virality of conditionally enabled fields

F: std::future::Future + Send + 'static,
F::Output: Send + 'static,
{
TelemetryHandle::current().spawn(future)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we intentionally waiting to integrate here?

Copy link
Copy Markdown
Contributor Author

@rcoh rcoh May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is integrated because the Traced future wraps the task dump future?

mut self,
config: crate::telemetry::task_dump_config::TaskDumpConfig,
) -> Self {
self.task_dump_config = Some(config);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we debug_assert if this is used without the taskdump feature? (Or, conditionally compile this whole setter out?)

let ips = &mut self.ips;
let offsets = &mut self.offsets;

// `trace_with`'s outer closure is `FnOnce`; `Option::take` moves the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs cleanup

fn capture_frames(
ips: &mut Vec<u64>,
root_addr: Option<*const core::ffi::c_void>,
leaf_addr: *const core::ffi::c_void,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth mentioning this is stable because it is inline(never) in tokio

do they consider that a load bearing thing currently?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is functionally required for the whole thing to work

// If we have captured frames from a previous idle, decide whether
// that idle was long enough to emit.
let should_emit = if this.frames.has_data() {
let now = crate::telemetry::events::clock_monotonic_ns();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to reuse prevoius clock_monotonic_ns()` call?

pub(crate) enabled: AtomicBool,
/// Set when `TaskDumpConfig` is provided at build time. When `true`,
/// wrapping futures capture async backtraces at yield points.
pub(crate) task_dumps_enabled: AtomicBool,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these atomics are future proofing for it being runtime configurable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah

@jlizen
Copy link
Copy Markdown
Member

jlizen commented May 5, 2026

@rcoh looks good, some questions

@rcoh rcoh enabled auto-merge May 6, 2026 15:38
@rcoh rcoh added this pull request to the merge queue May 6, 2026
Merged via the queue into main with commit 0e1e8fd May 6, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants