Add task dump capture behind taskdump feature#354
Conversation
adb6293 to
cd5580b
Compare
Wraps spawned futures in `TaskDumped<F>` when the `taskdump` feature is enabled. On each poll, if the previous idle gap exceeded the configured threshold, the frames captured at the last yield point are emitted as `TaskDumpEvent`. Capture itself runs inside `tokio::runtime::dump::trace_with` with a noop waker on a diagnostic re-poll, so it doesn't produce duplicate wake or poll events. Configured via `TracedRuntimeBuilder::with_task_dumps(TaskDumpConfig)` or `TelemetryCoreBuilder::task_dump_config`. Capture short-circuits when the guard is disabled, so a paused guard skips `trace_with` entirely. Bumps tokio to 1.52 for the `taskdump` feature.
jlizen
left a comment
There was a problem hiding this comment.
Looks good, some small cleanups, couple questions
| cargo test --all-targets --all-features | ||
| else | ||
| cargo test --all-targets \ | ||
| --features analysis,cpu-profiling,tracing-layer,worker-s3 |
There was a problem hiding this comment.
can we stay future proof with:
cargo hack test --each-feature --exclude-features taskdump
| /// longer than the configured threshold. Requires the `taskdump` crate | ||
| /// feature to actually record events; the builder accepts the config | ||
| /// unconditionally so the API surface is stable. | ||
| pub fn with_task_dumps( |
There was a problem hiding this comment.
the builder accepts the config
/// unconditionally so the API surface is stable
Seems like an impl detail
| trace_path: Option<PathBuf>, | ||
| /// Capture async backtraces at yield points. Requires the `taskdump` | ||
| /// crate feature to actually record events. | ||
| task_dump_config: Option<crate::telemetry::task_dump_config::TaskDumpConfig>, |
There was a problem hiding this comment.
is there a reason not to just conditional config the field out for non-taskdump-enabled?
There was a problem hiding this comment.
No, only the virality of conditionally enabled fields
| F: std::future::Future + Send + 'static, | ||
| F::Output: Send + 'static, | ||
| { | ||
| TelemetryHandle::current().spawn(future) |
There was a problem hiding this comment.
Are we intentionally waiting to integrate here?
There was a problem hiding this comment.
I think this is integrated because the Traced future wraps the task dump future?
| mut self, | ||
| config: crate::telemetry::task_dump_config::TaskDumpConfig, | ||
| ) -> Self { | ||
| self.task_dump_config = Some(config); |
There was a problem hiding this comment.
should we debug_assert if this is used without the taskdump feature? (Or, conditionally compile this whole setter out?)
| let ips = &mut self.ips; | ||
| let offsets = &mut self.offsets; | ||
|
|
||
| // `trace_with`'s outer closure is `FnOnce`; `Option::take` moves the |
| fn capture_frames( | ||
| ips: &mut Vec<u64>, | ||
| root_addr: Option<*const core::ffi::c_void>, | ||
| leaf_addr: *const core::ffi::c_void, |
There was a problem hiding this comment.
might be worth mentioning this is stable because it is inline(never) in tokio
do they consider that a load bearing thing currently?
There was a problem hiding this comment.
Yes it is functionally required for the whole thing to work
| // If we have captured frames from a previous idle, decide whether | ||
| // that idle was long enough to emit. | ||
| let should_emit = if this.frames.has_data() { | ||
| let now = crate::telemetry::events::clock_monotonic_ns(); |
There was a problem hiding this comment.
is there a way to reuse prevoius clock_monotonic_ns()` call?
| pub(crate) enabled: AtomicBool, | ||
| /// Set when `TaskDumpConfig` is provided at build time. When `true`, | ||
| /// wrapping futures capture async backtraces at yield points. | ||
| pub(crate) task_dumps_enabled: AtomicBool, |
There was a problem hiding this comment.
these atomics are future proofing for it being runtime configurable?
|
@rcoh looks good, some questions |
Summary
Adds a
TaskDumped<F>wrapper gated by the newtaskdumpcargo feature thatcaptures async backtraces at yield points for tasks idle longer than a
configurable threshold. Emits
TaskDumpEventinto the trace stream.Stacking is
Traced<TaskDumped<F>>—TaskDumpedis a separate wrapper so thecapture code
#[cfg]s out cleanly when the feature is off, and so it can beremoved from the poll hot path once by a single atomic check.
Configured via
TracedRuntimeBuilder::with_task_dumps(TaskDumpConfig)(orTelemetryCoreBuilder::task_dump_config). Capture short-circuits whentelemetry is disabled on the guard, so
trace_withdoesn't run on a pausedsession. Bumps tokio to 1.52 for the
taskdumpfeature.Part 1 of splitting #213. Not included yet: viewer JS changes, overhead
bench mode, metrics-service demo wiring, custom libunwind FFI.
This change does not include README or docs updates. The PR that adds UI support will include these updates.
Test plan
cargo test --features taskdump,analysis— 3 new integration tests intests/task_dump.rs, plus round-trip test inbuffer.rscargo clippy --all-features --testscargo fmt --allPollStart/PollEnd/WakeEventvs baseline