You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The `telemetry` module records lightweight runtime telemetry — poll start/end, worker park/unpark, and queue depth samples — into a compact binary trace format. Traces can be analyzed offline to find idle workers, long polls, and scheduling imbalances.
4
-
5
-
### Quick Start
6
-
7
-
The easiest way to get started is with `TracedRuntime`, which wires up all the hooks and background threads for you. Use a `RotatingWriter` to bound disk usage in production (see [`examples/telemetry_rotating.rs`](examples/telemetry_rotating.rs) for the full example):
3
+
**Low-overhead runtime telemetry for Tokio.** Records poll timing, worker park/unpark, wake events, queue depths, and (on Linux) CPU profile samples into a compact binary trace format. Traces can be analyzed offline to find long polls, scheduling delays, idle workers, and CPU hotspots.
// use build() to start disabled, then call guard.enable() later
24
18
let (runtime, _guard) =TracedRuntime::build_and_start(builder, Box::new(writer))?;
25
19
26
20
runtime.block_on(async {
27
-
//... your async code here ...
21
+
// your async code here
28
22
});
29
23
30
-
// Dropping `runtime` then `_guard` performs a final flush.
31
24
Ok(())
32
25
}
33
26
```
34
27
35
-
`TracedRuntime::build` returns a `TelemetryGuard` whose `handle()` method gives you a cheap, cloneable `TelemetryHandle`you can use to enable/disable recording at runtime.
28
+
Events are 6–16 bytes on the wire, and a typical request generates ~20–35 bytes of trace data (a few poll events plus park/unpark). At 10k requests/sec that's well under 1 MB/s — `RotatingWriter` caps total disk usage so you can leave it running indefinitely. Typical CPU overhead is under 5%.
36
29
37
-
### Writers
30
+
> **Note:** dial9-tokio-telemetry is designed for always-on production use, but it's still early software. Measure overhead and validate behavior in your environment before deploying to production.
38
31
39
-
| Writer | Use case |
40
-
|--------|----------|
41
-
|`RotatingWriter`| Production — automatically rotates and evicts old files to stay within a total size budget |
42
-
|`SimpleBinaryWriter`| Quick experiments — writes a single trace file with no size management |
43
-
|`NullWriter`| Benchmarking — measures hook overhead without any I/O |
32
+
## Is there a demo?
33
+
Yes, checkout this [quick walkthrough (YouTube)](https://www.youtube.com/watch?v=zJOzU_6Mf7Q)!
44
34
45
-
**Future**: S3 writer for direct cloud storage, or use existing log shipping (CWAgent, Firelens, etc.) to push trace files.
35
+
## Why dial9-tokio-telemetry?
46
36
47
-
### Analyzing Traces
37
+
Understanding how Tokio is actually running your application — which tasks are slow, why workers are idle, where scheduling delays come from — is hard to do from the outside. This crate records a continuous, low-overhead trace of runtime behavior.
48
38
49
-
Use the included examples to inspect trace files:
39
+
Compared to [tokio-console](https://github.com/tokio-rs/console), which is designed for live debugging, dial9-tokio-telemetry is designed for post-hoc analysis. Because traces are written to files with bounded disk usage, you can leave it running in production and come back later to deeply analyze what went wrong or why a specific request was slow. On Linux, traces include CPU profile samples and scheduler events, so you can see not just *that* a task was delayed but *what code* was running on the worker instead.
50
40
51
-
```bash
52
-
# Print a summary with per-worker stats and idle-worker detection
53
-
cargo run --example analyze_trace -- /tmp/my_traces/trace.0.bin
41
+
## What gets recorded automatically
54
42
55
-
# Convert a binary trace to JSONL for ad-hoc analysis
56
-
cargo run --example trace_to_jsonl -- /tmp/my_traces/trace.0.bin output.jsonl
43
+
`TracedRuntime` installs hooks on the Tokio runtime builder. These fire for every task on the runtime with no code changes required:
|`WorkerPark` / `WorkerUnpark`| timestamp, worker, local queue depth, thread CPU time, schedstat wait |
49
+
|`QueueSample`| timestamp, global queue depth (sampled every 10 ms) |
50
+
|`TaskSpawn` / `SpawnLocationDef`| task→spawn-location mapping (when `task_tracking` is enabled) |
51
+
52
+
## Wake event tracking
53
+
54
+
Wake events — which task woke which other task — are *not* captured automatically. Tokio's runtime hooks don't expose waker identity, so capturing this requires wrapping the future in `Traced<F>`, which installs a custom waker that records a `WakeEvent` before forwarding to the real waker.
55
+
56
+
Use `handle.spawn()` instead of `tokio::spawn()`:
57
+
58
+
```rust,no_run
59
+
# use dial9_tokio_telemetry::telemetry::{RotatingWriter, TracedRuntime};
60
+
# fn main() -> std::io::Result<()> {
61
+
# let writer = RotatingWriter::new("/tmp/t.bin", 1024, 4096)?;
62
+
# let builder = tokio::runtime::Builder::new_multi_thread();
63
+
let (runtime, guard) = TracedRuntime::build_and_start(builder, Box::new(writer))?;
64
+
let handle = guard.handle();
57
65
58
-
# Open the interactive HTML viewer
59
-
open trace_viewer.html
60
-
# Then drag-and-drop a .bin file to visualize the timeline
66
+
runtime.block_on(async {
67
+
// wake events captured — uses Traced<F> wrapper
68
+
handle.spawn(async { /* ... */ });
69
+
70
+
// wake events NOT captured — still gets poll/park/queue telemetry
71
+
tokio::spawn(async { /* ... */ });
72
+
});
73
+
# Ok(())
74
+
# }
61
75
```
62
76
63
-
**Future**: S3 writer for direct cloud storage, or use existing log shipping (CWAgent, Firelens, etc.) to push trace files.
77
+
For frameworks like Axum where you don't control the spawn call, you need to wrap the accept loop. See [`examples/metrics-service/src/axum_traced.rs`](../examples/metrics-service/src/axum_traced.rs) for a working example that wraps both the accept loop and per-connection futures.
64
78
65
-
## Examples
79
+
## Platform support
80
+
81
+
Core telemetry (poll timing, park/unpark, queue depth, wake events) works on all platforms.
82
+
83
+
On Linux, you get additional data for free:
84
+
-**Thread CPU time** in park/unpark events via `CLOCK_THREAD_CPUTIME_ID` (vDSO, ~20–40 ns)
85
+
-**Scheduler wait time** via `/proc/self/task/<tid>/schedstat` — shows how long the OS kept your thread off-CPU
86
+
87
+
On non-Linux platforms these fields are zero.
88
+
89
+
### CPU profiling (Linux only)
90
+
91
+
With the `cpu-profiling` feature, you can enable `perf_event_open`-based CPU sampling and scheduler event capture. This records stack traces attributed to specific worker threads, so you can see *what code* was running during a scheduling delay.
92
+
93
+
```rust,no_run
94
+
# #[cfg(feature = "cpu-profiling")]
95
+
# fn main() -> std::io::Result<()> {
96
+
# use dial9_tokio_telemetry::telemetry::{RotatingWriter, TracedRuntime};
97
+
use dial9_tokio_telemetry::telemetry::{CpuProfilingConfig, SchedEventConfig};
98
+
99
+
# let writer = RotatingWriter::new("/tmp/t.bin", 1024, 4096)?;
100
+
# let builder = tokio::runtime::Builder::new_multi_thread();
This pulls in [`dial9-perf-self-profile`](perf-self-profile) for `perf_event_open` access. It records `CpuSample` events with full callchains and `CallframeDef` / `ThreadNameDef` metadata for offline symbolization.
114
+
115
+
## Getting started
116
+
117
+
`TracedRuntime::build` returns a `(Runtime, TelemetryGuard)`. The guard owns the flush thread and provides a `TelemetryHandle` for enabling/disabling recording at runtime:
118
+
119
+
```rust,no_run
120
+
# use dial9_tokio_telemetry::telemetry::{RotatingWriter, TracedRuntime};
121
+
# fn main() -> std::io::Result<()> {
122
+
# let writer = RotatingWriter::new("/tmp/t.bin", 1024, 4096)?;
123
+
# let builder = tokio::runtime::Builder::new_multi_thread();
124
+
let (runtime, guard) = TracedRuntime::builder()
125
+
.with_task_tracking(true)
126
+
.build(builder, Box::new(writer))?;
127
+
128
+
// start disabled, enable later
129
+
guard.enable();
130
+
131
+
// TelemetryHandle is Clone + Send — pass it around
132
+
let handle = guard.handle();
133
+
handle.disable();
134
+
# Ok(())
135
+
# }
136
+
```
137
+
138
+
### Writers
139
+
140
+
`RotatingWriter` is what you want for production — it rotates files and evicts old ones to stay within a total size budget. `SimpleBinaryWriter` writes a single file with no size management, useful for quick experiments. `NullWriter` measures hook overhead without doing any I/O.
cargo run --example analyze_trace -- /tmp/my_traces/trace.0.bin
147
+
148
+
# convert to JSONL for ad-hoc scripting
149
+
cargo run --example trace_to_jsonl -- /tmp/my_traces/trace.0.bin output.jsonl
77
150
```
78
151
79
-
## Benchmarks
152
+
There's also an interactive HTML trace viewer — open `trace_viewer/index.html` and drag in a `.bin` file. [Here's a demo.](https://www.youtube.com/watch?v=zJOzU_6Mf7Q)
153
+
154
+
See [TRACE_ANALYSIS_GUIDE.md](TRACE_ANALYSIS_GUIDE.md) for a walkthrough of diagnosing scheduling delays and CPU hotspots from trace data.
155
+
156
+
## Features
157
+
158
+
-**`cpu-profiling`** — Linux only. Enables `perf_event_open`-based CPU sampling and scheduler event capture via `dial9-perf-self-profile`.
159
+
-**`task-dump`** — Enables Tokio's `taskdump` feature for async stack traces. Required for the `long_sleep`, `completing_task`, `cancelled_task`, and `debug_timing` examples.
160
+
161
+
## Examples
80
162
81
-
Run benchmarks with:
82
163
```bash
83
-
cargo bench
164
+
cargo run --example telemetry_rotating # rotating writer demo
165
+
cargo run --example simple_workload # basic instrumented workload
166
+
cargo run --example realistic_workload # mixed CPU/IO workload
167
+
cargo run --example long_workload # longer run for trace analysis
84
168
```
85
169
86
-
### Overhead Comparison
170
+
The [`examples/metrics-service`](../examples/metrics-service) directory has a full Axum service with DynamoDB persistence, a load-generating client, and telemetry wired up end-to-end.
171
+
172
+
## Overhead
87
173
88
-
Compare baseline vs telemetry overhead:
89
174
```bash
90
175
./scripts/compare_overhead.sh [duration_secs]
91
176
```
92
177
93
-
This runs the `overhead_bench` in both modes and validates:
94
-
- Telemetry overhead is acceptable (< 10%)
95
-
- Trace bytes per request (20-35 bytes) - tracks total trace data generated per client request
96
-
- Bytes per trace event (6-12 bytes) - validates binary format efficiency
178
+
This runs the `overhead_bench` binary with and without telemetry and reports the difference. Typical output:
97
179
98
-
Example output:
99
-
```
100
-
=== Comparison ===
180
+
```text
101
181
Baseline: 286794 req/s, p50=174.1µs, p99=280.6µs
102
182
Telemetry: 277626 req/s, p50=180.2µs, p99=289.3µs
103
183
Overhead: 3.2%
104
-
105
-
=== Trace Efficiency ===
106
-
Trace bytes/request: 25.56
107
-
Bytes/trace event: 6.39
108
-
Client requests/sec: 277682
109
184
```
110
185
111
-
## Configuration
186
+
## Workspace
187
+
188
+
This repo is a Cargo workspace with three members:
112
189
113
-
The system uses `.cargo/config.toml` to enable the `tokio_unstable` flag required for task dumps.
190
+
-[`dial9-tokio-telemetry`](dial9-tokio-telemetry) — the main crate
191
+
-[`dial9-perf-self-profile`](perf-self-profile) — minimal Linux `perf_event_open` wrapper for CPU profiling and scheduler events
192
+
-[`examples/metrics-service`](examples/metrics-service) — end-to-end example service
114
193
115
-
## Dependencies
194
+
## Future work
116
195
117
-
-`tokio` (with `taskdump` feature)
118
-
-`arc-swap` (for lock-free sentinel updates)
119
-
-`pin-project-lite` (for proper pinning in the future wrapper)
120
-
-`smallvec` (for efficient small vector storage)
196
+
-**S3 writer** — upload traces directly to S3 instead of relying on log shipping
197
+
-**Parquet output** — write traces as Parquet for efficient querying with Athena, DuckDB, etc.
198
+
-**Tokio task dumps** — capture async stack traces of all in-flight tasks
199
+
-**Retroactive sampling** — trace data lives in a ring buffer; when your application detects anomalous behavior, it triggers persistence of the last N seconds of data rather than recording everything continuously
200
+
-**Out-of-process symbolication** — resolve CPU profile stack traces in a background process to avoid adding latency or memory overhead to the application
0 commit comments