All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.3.5 - 2026-04-24
- (viewer) resizable sidebar and slightly improved ux (#290)
- (ci) use single cargo package invocation for cross-crate verification
- lower expected sample threshold in ctimer cpu load test (#292)
- Restore collect_files safety limits lost by merge-queue bug (#276) (#295)
- restore (#286)
- Fix stack overflow on large profiles (#285)
0.3.4 - 2026-04-23
- add CPU profiling fallback for perf-restricted environments. This should enable CPU profiling to work in Fargate. (#250)
- (viewer) replace stack trace popup with right sidebar panel (#274)
- (viewer) Pop-out flamegraph with interactive features (#269)
- fix sort order of polls with cpu samples (#272)
- Make worker lanes scrollable in viewer (#275)
- Fix docs.rs broken links by using absolute GitHub URLs (#277)
0.3.3 - 2026-04-20
- tighten README prose for readability and conciseness (#265)
0.3.2 - 2026-04-20
- crosslink dial9-viewer from the readme (#262)
0.3.1 - 2026-04-19
- Tracing layer (#252):
Dial9TokioLayerrecordstracingspan enter/exit events, including field values, into the trace, showing what happened inside each poll. Enable with thetracing-layerfeature flag.
The viewer (dial9-viewer serve) shows spans in a dedicated panel with filtering, percentile ranking, and click-to-highlight. The agent analysis toolkit (dial9-viewer agents) includes span correlation recipes and automated span checks in the red-flags scan.
use dial9_tokio_telemetry::tracing_layer::Dial9TokioLayer;
use tracing_subscriber::prelude::*;
tracing_subscriber::registry()
.with(Dial9TokioLayer::new().with_filter(
tracing_subscriber::filter::Targets::new()
.with_target("my_app", tracing::Level::TRACE)
.with_default(tracing::Level::ERROR),
))
.init();Tracing support means you can attach a request ID or other context to spans via #[instrument(fields(request_id = %id))] and then search for specific requests in the trace. You can also see what's happening inside long polls: if a single poll contains many small operations without yielding, the span breakdown shows exactly where the time went.
Standard tracing-subscriber filtering rules apply. Without a filter, libraries like the AWS SDK will flood the trace with internal spans. The preceding captures only spans from my_app.
0.3.0 - 2026-04-17
Big release. The setup story is much better, there's support for tracing multiple runtimes, you can emit your own events into the trace, and the viewer is its own crate now.
#[dial9_tokio_telemetry::main] macro (#212)
Drop-in replacement for #[tokio::main]. Point it at a config function and you're done. Unlike #[tokio::main], the macro spawns your function body as a task, so top-level code shows up in traces.
use dial9_tokio_telemetry::config::{Dial9Config, Dial9ConfigBuilder};
use dial9_tokio_telemetry::telemetry::TelemetryHandle;
fn my_config() -> Dial9Config {
Dial9ConfigBuilder::new("trace.bin", 64 * 1024 * 1024, 256 * 1024 * 1024)
.with_runtime(|r| r.with_task_tracking(true))
.build()
}
#[dial9_tokio_telemetry::main(config = my_config)]
async fn main() {
let handle = TelemetryHandle::current();
handle.spawn(async { /* wake events tracked */ }).await.unwrap();
}If you run separate runtimes for request handling and background IO (or thread-per-core, etc.), you can now attach them all to one telemetry session with TelemetryCore. Workers are grouped by runtime name in the viewer.
use dial9_tokio_telemetry::telemetry::{RotatingWriter, TelemetryCore};
let writer = RotatingWriter::builder()
.base_path("/tmp/traces/trace.bin")
.max_file_size(100 * 1024 * 1024)
.max_total_size(500 * 1024 * 1024)
.build()?;
let guard = TelemetryCore::builder().writer(writer).build()?;
guard.enable();
let mut main_builder = tokio::runtime::Builder::new_multi_thread();
main_builder.worker_threads(4).enable_all();
let (main_rt, main_handle) = guard.trace_runtime("main").build(main_builder)?;
let mut io_builder = tokio::runtime::Builder::new_multi_thread();
io_builder.worker_threads(2).enable_all();
let (io_rt, io_handle) = guard.trace_runtime("io").build(io_builder)?;dial9-viewer crate (#177)
The trace viewer is its own crate now: cargo install dial9-viewer. It serves the interactive HTML viewer locally and can browse traces on S3.
You can emit your own events into the trace. Derive TraceEvent, call record_event. They are not currently visible in the viewer although they can be loaded from the trace via the JS parser directly. Repeated string values (HTTP methods, paths, etc.) can be interned to save space on the wire.
use dial9_trace_format::TraceEvent;
use dial9_tokio_telemetry::telemetry::{record_event, clock_monotonic_ns, TelemetryHandle};
#[derive(TraceEvent)]
struct RequestCompleted {
#[traceevent(timestamp)]
timestamp_ns: u64,
status_code: u32,
latency_us: u64,
error_message: Option<String>,
}
record_event(
RequestCompleted {
timestamp_ns: clock_monotonic_ns(),
status_code: 200,
latency_us: 1500,
error_message: None,
},
&handle,
);Trace file concatenation (#134)
Uncompressed trace files can be concatenated (cat trace.0.bin trace.1.bin > combined.bin) and loaded as a single trace. The decoder resets parser state at segment boundaries via reset frames.
- Time-based rotation for
RotatingWriter: segments rotate on wall-clock boundaries (e.g. every 60s), which gives clean S3 key paths (#136, #179) boot_idin default S3 key layout so segments from different process starts don't collide (#225, #237)- Sampling for scheduler events: record 1-in-N context switches via
SchedEventConfig::sampling_interval(#233) - Optional field type modifiers and named field decode in trace format (#216)
- Segments now contain all data for their time range. Previously, thread-local buffers could drain mid-rotation, causing events from one wall-clock period to land in the wrong segment (up to 8s of timestamp overlap between adjacent files). Rotation now coordinates with the flush loop: bump epoch, drain all buffers, flush, then rotate (#224, #186)
- Traces now include monotonic-to-realtime clock sync frames for precise wall-clock alignment (#210, #214)
perf-self-profilecompiles on macOS (#174)- Background worker no longer busy-loops re-processing already gzipped segments (#154, #155)
single_filewriter uses.activesuffix so the background worker can symbolize and gzip sealed segments (#164)- Empty segments are no longer sealed on finalize (#127)
- Blocking thread TIDs are captured directly instead of guessed by name (#120)
- Rate-limited internal tracing/logging to prevent log spam (#209)
- Binary search for CPU sample attachment, faster rendering (#201, #143)
- Relative/absolute time toggle (#146)
- Gzip decompression in JS parser: load
.bin.gzfiles directly (#178) - Escape stack frame names in flamegraph tooltips (#142)
- Handle truncated frames without crashing (#98)
SamplerConfigandCpuProfilingConfignow use builders instead of struct-literal construction, consistent withSchedEventConfig(#244)
- Removed recorder mutex; events encode directly into thread-local buffers (#122, #133, #135)
graceful_shutdownis synchronous now (#151)- Public API cleanup and lints (#175, #129)
- Bencher integration for continuous overhead tracking (#150)
0.2.0 - 2026-03-20
0.2.0 brings two major improvements:
- Support for publishing traces to S3
- Migration to the new trace format (dial9-trace-format). This format is self describing, extremely compact, compressible and fast to write. This will set us up to easily add application level telemetry in the future.
For setting it up in production applications, the new .install(true/false) method makes it easy to have a single instantiation path for your runtime but set install(false) to make dial9 a complete no-op.
- Wire background symbolization into the flush/worker pipeline (#95)
- Improve s3 writer's configuration API (#86)
- add install() builder method for conditional telemetry install (#85)
- Add ProcMaps frame and offline symbolizer for background symbolization (#87)
- offline symbolization cleanups and optimizations (#111)
- write segment metadata at the beginning of the file and add RotatingWriterBuilder (#115)
- Bring back support for locations in offline symbolization (#110)
- stop writing trailing garbage in gzip segments after graceful_shutdown (#104)
- Fix worker spin-loop on gzip-compressed and permanently failing segments (#102)
- (trace_viewer) update format name from TOKIOTRC to D9TF in landing screen (#103)
- (js-decoder) handle truncated frames gracefully, read symbol frames even if >= MAX_EVENTS (#98)
- clarify S3 key layout is the default, not the only option (#89)
- add missing crates.io metadata (#84)
- thread-local buffer not flushing on drop (#54)
- (trace-parser) consolidate per-branch cap checks into early continue (#116)
- fix flaky worker park test (#117)
- Harden flush path with ArrayQueue & emit metrics (#97)
- Update demo trace to have symbols (#105)
- add kptr_restrict guidance for kernel symbol resolution (#99)
- Switch to new trace format (#91)
- Prepare to migrate to dial9-trace-format (#76)
- Add kernel tracepoint support to perf-self-profile (#81)
- document tokio_unstable prerequisite for downstream consumers (#90)
- Support kernel frames in callchains when include_kernel=true (#77)
- Enable perf_event_open tests in CI (#78)
- S3 reporter (#60)
- Viewer UX: hint toasts, help overlay, keyboard accessibility (#69)
- Backport analysis algorithms from trace viewer into core (#35) (#63)
- Add SegmentMetadata event (wire code 11) (#66)
- Remove SimpleBinaryWriter, use RotatingWriter everywhere (#65)
- Track task spawn/terminate events and show active task count in trace… (#48)
- Track blocking pool threads in CPU profiler via ThreadRole (#52)
- Docs improvements & fix ui paper cuts (#57)
0.1.1 - 2026-03-03
- Fix trace viewer crash when loading trace from URL parameter (#42)
- Improve symbolization and include docs.rs links in call frames (#39)
- Add demo trace (#40)
- fix: take_rotated() was inside debug_assert, never ran in release builds (#41)
0.1.0 - 2026-03-01
- Update readme and allow tests to pass on macOS (#22)
- Add Cloudflare Workers configuration (#29)
- Support Compilation on MacOS (#16)
- Enable CPU profiling in metrics service and extract client binary (#12)
- Integrate CPU profiling into Dial9 (#11)
- Initial implementation of tracking task wakes (#4)
- Convert to workspace, move crate into dial9-tokio-telemetry/ (#5)