You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+17-12Lines changed: 17 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,21 +9,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
9
9
10
10
### Added
11
11
12
-
-`Dial9Config::builder()` — a `bon`-generated fluent entry point with named setters for the required writer fields (`base_path`, `max_file_size`, `max_total_size`), stackable `with_tokio` / `with_runtime` closures, and an `.enabled(bool)` toggle that selects the no-telemetry path on the same builder. Re-exported at the crate root as `dial9_tokio_telemetry::Dial9Config` / `Dial9ConfigBuilder`. The original positional-argument API (`Dial9ConfigBuilder::new(..)` / `::disabled()` under `dial9_tokio_telemetry::config`) is unchanged and remains fully supported.
13
-
-`Dial9ConfigBuilder::build()` returns `Result<Dial9Config, Dial9ConfigBuilderError>`. Required-field validation **and** the `RotatingWriter` transport-I/O probe both happen at config-build time, so by the time you have a `Dial9Config` the trace file has already been opened.
14
-
-`Dial9ConfigBuilder::build_or_disabled()` — lenient counterpart to `build()`. Returns the same `Dial9Config` type; on validation or writer-I/O failure it logs at `tracing::error!(target = "dial9_telemetry")` and downgrades to a disabled config that still carries the user's `with_tokio` configurators.
15
-
-`Dial9ConfigBuilderError` enum with `Validation(ValidationError)` and `Io(std::io::Error)` variants, both implementing `std::error::Error`. `ValidationError::fields()` returns the names of the unset required setters.
16
-
-`TracedRuntime::new(config)` / `TracedRuntime::try_new(config)` — high-level constructors used by the `#[dial9_tokio_telemetry::main]` macro. Both accept either the new fluent `Dial9Config` or the deprecated positional `dial9_tokio_telemetry::config::Dial9Config` via `TryInto<TracedRuntime>`. `try_new` returns the conversion error (`TelemetryRuntimeError` for the fluent path, `std::io::Error` for the legacy bridge); `new` panics with that error formatted via `Display`.
17
-
-`TelemetryRuntimeError` enum with `TokioRuntimeBuilder` and `TelemetryCore` variants — the only failure modes left after writer-I/O has been moved into the config builder.
18
-
-`TelemetryHandle::disabled()` — explicit constructor for an inert handle whose `spawn` falls through to `tokio::spawn` and whose control methods are no-ops. `TelemetryHandle::is_enabled()` distinguishes the live and inert modes.
19
-
-`TelemetryGuard::is_enabled()` and a no-op `Disabled` mode: a `TelemetryGuard` is now always present on a `TracedRuntime`, regardless of whether telemetry was installed. `TelemetryGuard::handle()` returns an inert `TelemetryHandle` on a disabled guard, and `graceful_shutdown()` is a successful no-op there.
12
+
-**Inline config in the macro**: `#[dial9_tokio_telemetry::main]` now accepts a closure, so simple setups no longer need a separate config function.
13
+
-**Fluent config builder**: `Dial9Config::builder()` with named setters, `with_tokio`/`with_runtime` closures, and `.enabled(bool)`. The original positional API under `dial9_tokio_telemetry::config` is unchanged.
14
+
-**`build_or_disabled()`**: on config validation or writer I/O failure, logs an error and starts a plain tokio runtime instead of crashing. Use `build()` to handle failures explicitly.
15
+
16
+
All three in action:
17
+
18
+
```rust
19
+
#[dial9_tokio_telemetry::main(config =|| {
20
+
Dial9Config::builder()
21
+
.base_path("/tmp/trace.bin")
22
+
.max_file_size(64 * 1024 * 1024)
23
+
.max_total_size(256 * 1024 * 1024)
24
+
.build_or_disabled()
25
+
})]
26
+
asyncfnmain() { /* ... */ }
27
+
```
20
28
21
29
### Changed
22
30
23
-
-**Breaking:**`TelemetryHandle::current()` no longer panics off-runtime. It now returns an inert handle when called from a thread that is not owned by a dial9 runtime, mirroring the always-present-guard model. Use `TelemetryHandle::is_enabled()` to branch on whether telemetry is live on the current thread.
24
-
-**Breaking:**`TelemetryHandle::try_current()` is deprecated in favor of `current()`; call sites that relied on the `Option` ceremony can now drop it.
25
-
-**Breaking:**`TracedRuntime::guard()` returns `&TelemetryGuard` (always-present) instead of `Option<&TelemetryGuard>`. Replace `rt.guard().is_some()` with `rt.guard().is_enabled()`.
26
-
-**Breaking:** the high-level `TracedRuntime` type at the crate root supersedes the previous `TelemetryRuntime` (which was renamed). The `#[dial9_tokio_telemetry::main]` macro now expands to `TracedRuntime::new(...)`.
31
+
-`TelemetryHandle::current()` no longer panics off-runtime. It returns an inert handle whose `spawn` falls through to `tokio::spawn`. Use `TelemetryHandle::is_enabled()` to check whether telemetry is live.
@@ -24,7 +24,7 @@ Without this flag, compilation will fail with errors about missing methods on `t
24
24
25
25
## Quick start
26
26
27
-
There are two ways to set up dial9: the `#[main]` macro (recommended for most apps) or manual `TracedRuntime` setup. The macro handles the boilerplate of building the runtime and spawning your code as an instrumented task. Inside your `main` body, call `TelemetryHandle::current()` to get the handle for wake-event tracking. Use manual setup when you have multiple Tokio runtimes, don't own `main` (e.g., library code or embedded services), or need to integrate with existing runtime-building code.
27
+
There are two ways to set up dial9: the `#[main]` macro (recommended for most apps) or `TracedRuntime::new()`/`try_new()` for manual control. Both use `Dial9Config::builder()` to configure telemetry. Inside your `main` body, call `TelemetryHandle::current()` to get the handle for wake-event tracking.
#[dial9_tokio_telemetry::main(config = my_config)] // inline config function is also supported
49
48
async fn main() {
50
49
// your async code here
51
50
// `TelemetryHandle::current()` returns the per-thread handle for
@@ -60,45 +59,11 @@ async fn main() {
60
59
61
60
The macro automatically spawns your function body as a task, so top-level code is visible in traces (unlike plain `#[tokio::main]` where `block_on` work is invisible — see [below](#the-root-future-is-not-instrumented)). dial9 installs a `TelemetryHandle` on every runtime-owned thread via `on_thread_start`. Call `TelemetryHandle::current()` to get it for spawning wake-tracked sub-tasks.
62
61
63
-
### Optional telemetry on I/O failure
64
-
65
-
`build()` is strict: missing required writer fields, or an unwritable `base_path`, surface as a `Dial9ConfigBuilderError` (panicking through the macro). When telemetry is best-effort — e.g. you'd rather start the service than fail on a misconfigured trace path — finish with `build_or_disabled()` instead. It returns the same `Dial9Config` type, but:
66
-
67
-
- emits a `tracing::error!` log on the `dial9_telemetry` target when validation or writer-I/O probing fails, and
68
-
- downgrades to a disabled config that still carries your `with_tokio` configurators (worker count, thread names, etc. are preserved).
// Telemetry may or may not be active; the returned handle is inert
85
-
// when the lenient downgrade fired.
86
-
use dial9_tokio_telemetry::telemetry::TelemetryHandle;
87
-
let handle = TelemetryHandle::current();
88
-
// `handle.spawn` records wake events when telemetry is live and
89
-
// falls through to plain `tokio::spawn` when it is not.
90
-
handle.spawn(async { /* ... */ });
91
-
if handle.is_enabled() {
92
-
// any code paths specific to "telemetry on"
93
-
}
94
-
}
95
-
```
96
-
97
-
See [`examples/optional_telemetry.rs`](/dial9-tokio-telemetry/examples/optional_telemetry.rs) for an end-to-end run including a `DIAL9_TRACE_PATH=/unwritable/...` mode that exercises the downgrade.
62
+
`build_or_disabled()` returns a pass-through config on I/O or validation failure, so the service starts on a plain tokio runtime instead of crashing. `TelemetryHandle::current()` returns an inert handle in that case, and `handle.spawn` falls through to `tokio::spawn`.
98
63
99
64
### Without the macro
100
65
101
-
The macro expands to `TracedRuntime::new(...).block_on(...)`. If you'd rather drive that yourself — for tests, libraries that build their own runtime, or any code that doesn't own `main` —`TracedRuntime` is a public type that accepts a `Dial9Config`:
66
+
The macro expands to `TracedRuntime::new(...).block_on(...)`. If you'd rather drive that yourself (graceful shutdown, multiple runtimes, tests, or any code that doesn't own `main`),`TracedRuntime` is a public type that accepts a `Dial9Config`:
102
67
103
68
```rust,no_run
104
69
use dial9_tokio_telemetry::{Dial9Config, TracedRuntime};
@@ -107,47 +72,15 @@ let cfg = Dial9Config::builder()
107
72
.base_path("/tmp/my_traces/trace.bin")
108
73
.max_file_size(1024 * 1024)
109
74
.max_total_size(5 * 1024 * 1024)
110
-
.build()
111
-
.expect("config build failed");
75
+
.build_or_disabled();
112
76
113
-
let rt = TracedRuntime::try_new(cfg).expect("runtime build failed");
77
+
let rt = TracedRuntime::try_new(cfg).expect("tokio runtime failed to start");
114
78
rt.block_on(async {
115
79
// body runs as a spawned, instrumented task — same as under #[main]
let (runtime, guard) =TracedRuntime::build_and_start(builder, writer)?;
136
-
lethandle=guard.handle();
137
-
138
-
runtime.block_on(async {
139
-
handle.spawn(async {
140
-
// your async code here will be instrumented
141
-
}).await.unwrap();
142
-
});
143
-
144
-
Ok(())
145
-
}
146
-
```
147
-
148
-
Events are 6–16 bytes on the wire, and a typical request generates ~20–35 bytes of trace data (a few poll events plus park/unpark). At 10k requests/sec that's well under 1 MB/s — `RotatingWriter` caps total disk usage so you can leave it running indefinitely. Typical CPU overhead is under 5%.
149
-
150
-
Segments rotate on size _or_ time, whichever comes first. Time boundaries are wall-clock-aligned (e.g. a 60 s period rotates at the top of each minute), which produces clean S3 key paths when using the `worker-s3` feature.
83
+
For lower-level control (custom `TraceWriter`, multiple runtimes sharing one telemetry session, or direct access to the `TelemetryGuard`), see `TracedRuntime::builder()` and `TelemetryCore::builder()` in the API docs.
151
84
152
85
## Can I use this in prod?
153
86
@@ -223,22 +156,16 @@ To understand when Tokio itself is delaying your code (scheduler delay), you nee
223
156
Use `handle.spawn()` instead of `tokio::spawn()`:
224
157
225
158
```rust,no_run
226
-
# use dial9_tokio_telemetry::telemetry::{RotatingWriter, TracedRuntime};
227
-
# fn main() -> std::io::Result<()> {
228
-
# let writer = RotatingWriter::new("/tmp/t.bin", 1024, 4096)?;
229
-
# let builder = tokio::runtime::Builder::new_multi_thread();
230
-
let (runtime, guard) = TracedRuntime::build_and_start(builder, writer)?;
231
-
let handle = guard.handle();
159
+
use dial9_tokio_telemetry::telemetry::TelemetryHandle;
232
160
233
-
runtime.block_on(async {
234
-
// wake events / scheduling delay captured
235
-
handle.spawn(async { /* ... */ });
161
+
// Inside a dial9 runtime (macro or TracedRuntime):
162
+
let handle = TelemetryHandle::current();
236
163
237
-
// this task is still tracked, but won't have wake events
238
-
tokio::spawn(async { /* ... */ });
239
-
});
240
-
# Ok(())
241
-
# }
164
+
// wake events / scheduling delay captured
165
+
handle.spawn(async { /* ... */ });
166
+
167
+
// this task is still tracked, but won't have wake events
168
+
tokio::spawn(async { /* ... */ });
242
169
```
243
170
244
171
For frameworks like Axum where you don't control the spawn call, you need to wrap the accept loop. See [`examples/metrics-service/src/axum_traced.rs`](/examples/metrics-service/src/axum_traced.rs) for a working example that wraps both the accept loop and per-connection futures.
@@ -301,21 +228,26 @@ Both of these events are tied to the precise instant and thread that they happen
301
228
302
229
```rust,no_run
303
230
# #[cfg(feature = "cpu-profiling")]
304
-
# fn main() -> std::io::Result<()> {
305
-
# use dial9_tokio_telemetry::telemetry::{RotatingWriter, TracedRuntime};
231
+
# mod inner {
232
+
use dial9_tokio_telemetry::Dial9Config;
306
233
use dial9_tokio_telemetry::telemetry::cpu_profile::{CpuProfilingConfig, SchedEventConfig};
307
234
308
-
# let writer = RotatingWriter::new("/tmp/t.bin", 1024, 4096)?;
309
-
# let builder = tokio::runtime::Builder::new_multi_thread();
Because CPU samples are tagged with the worker thread they were collected on, and the trace records which task is being polled on each worker at each instant, the viewer can correlate samples with individual polls. When a poll takes an unusually long time (a "long poll"), the CPU samples collected during that poll show you exactly what code was running — expensive serialization, accidental blocking I/O, lock contention, etc. In the trace viewer, click on a long poll to see its flamegraph, or shift+drag to aggregate CPU samples across a time range.
357
289
358
-
## Getting started
359
-
360
-
`TracedRuntime::build` returns a `(Runtime, TelemetryGuard)`. The guard owns the flush thread and provides a `TelemetryHandle` for enabling/disabling recording at runtime:
361
-
362
-
```rust,no_run
363
-
# use dial9_tokio_telemetry::telemetry::{RotatingWriter, TracedRuntime};
364
-
# fn main() -> std::io::Result<()> {
365
-
# let writer = RotatingWriter::new("/tmp/t.bin", 1024, 4096)?;
366
-
# let builder = tokio::runtime::Builder::new_multi_thread();
367
-
let (runtime, guard) = TracedRuntime::builder()
368
-
.with_task_tracking(true)
369
-
.build(builder, writer)?;
370
-
371
-
// start disabled, enable later
372
-
guard.enable();
373
-
374
-
// TelemetryHandle is Clone + Send — pass it around
375
-
let handle = guard.handle();
376
-
handle.disable();
377
-
# Ok(())
378
-
# }
379
-
```
380
-
381
-
### Multiple runtimes
290
+
## Multiple runtimes
382
291
383
292
For applications with multiple Tokio runtimes (e.g. thread-per-core, or separate request/IO runtimes), use `TelemetryCore` to create the telemetry session first, then attach each runtime:
384
293
385
294
```rust,no_run
386
295
# use dial9_tokio_telemetry::telemetry::{RotatingWriter, TelemetryCore};
387
296
# fn main() -> std::io::Result<()> {
388
-
# let writer = RotatingWriter::new("/tmp/t.bin", 1024, 4096)?;
@@ -411,11 +320,11 @@ See [`examples/thread_per_core.rs`](/dial9-tokio-telemetry/examples/thread_per_c
411
320
412
321
**Shutdown**: Drop all runtimes before the `TelemetryGuard` so worker threads exit and flush their thread-local buffers. For a clean shutdown that waits for the background worker (e.g. S3 uploads) to drain, call `guard.graceful_shutdown(timeout)` instead of dropping the guard.
413
322
414
-
###Writers
323
+
## Writers
415
324
416
325
`RotatingWriter` rotates files based on size and time, and evicts old ones to stay within a total size budget. By default, segments rotate every 60 seconds (wall-clock-aligned) or when they exceed `max_file_size`, whichever comes first. Time-based rotation produces clean segment boundaries (thread-local buffers are drained before sealing), so set `max_file_size` large enough that time-based rotation fires first under normal conditions (100 MiB is a good default). Size-based rotation then acts as a safety valve for unexpected data bursts. For quick experiments, use `RotatingWriter::single_file(path)` to skip rotation entirely.
417
326
418
-
###Analyzing traces
327
+
## Analyzing traces
419
328
420
329
[`dial9-viewer`](/dial9-viewer) is an interactive trace viewer and S3 browser. Point it at a local directory or an S3 bucket to browse and visualize traces in the browser. [Here's a demo.](https://www.youtube.com/watch?v=zJOzU_6Mf7Q)
421
330
@@ -459,39 +368,34 @@ Only `bucket` and `service_name` are required. See `S3Config` for additional opt
459
368
460
369
```rust,no_run
461
370
# #[cfg(feature = "worker-s3")]
462
-
# fn main() -> std::io::Result<()> {
463
-
use dial9_tokio_telemetry::telemetry::{RotatingWriter, TracedRuntime};
371
+
# mod inner {
372
+
use dial9_tokio_telemetry::Dial9Config;
464
373
use dial9_tokio_telemetry::background_task::s3::S3Config;
465
374
466
-
let trace_path = "/tmp/my_traces/trace.bin";
467
-
let writer = RotatingWriter::builder()
468
-
.base_path(trace_path)
469
-
.max_file_size(100 * 1024 * 1024) // safety valve at 100 MiB per file
470
-
.max_total_size(500 * 1024 * 1024) // keep at most 500 MiB on disk
471
-
.build()?;
472
-
473
-
let s3_config = S3Config::builder()
474
-
.bucket("my-trace-bucket")
475
-
.service_name("my-service")
476
-
.build();
477
-
478
-
let mut builder = tokio::runtime::Builder::new_multi_thread();
0 commit comments