Extract "TelemetryCore" separate from TracedRuntime by rcoh · Pull Request #193 · dial9-rs/dial9

rcoh · 2026-04-13T18:45:50Z

Extract TelemetryCore so that you can spawn Dial9 separately from spawning (potentially multiple) Tokio runtimes.

…runtime Introduces `TelemetryCore::builder().writer(w).build()` which produces a `TelemetryGuard` without building any tokio runtime. Runtimes are then attached via `guard.trace_runtime("name", builder)`. - New `TelemetryCore` struct with `#[bon::builder]` for session config - New `TelemetryGuard::trace_runtime()` method - `TracedRuntime::builder()` reimplemented on top of TelemetryCore - `build_with_reuse()` delegates to shared `attach_runtime()` helper - Flush loop extracted to `run_flush_loop()` free function - `SharedState` now stores `task_tracking_enabled` for use by `trace_runtime` All 167 existing tests pass unchanged.

Demonstrates the decoupled pattern: create telemetry session first, then attach coordinator + per-core runtimes via trace_runtime().

…ction - multi_runtime example now uses TelemetryCore::builder() + trace_runtime() - Added "Multiple runtimes" section to README with TelemetryCore example

- Switch TelemetryCore from `#[builder(finish_fn = build)] fn builder()` to `#[builder] fn new()` — cleaner, same call site - Rename `build_with_reuse` → `build_and_attach_to_telemetry` for clarity

github-actions · 2026-04-13T18:47:29Z

Bencher Report

Branch	telemetry-core
Testbed	ubuntu-latest

Click to view all benchmark results

Benchmark	Latency	Benchmark Result microseconds (µs) (Result Δ%)	Upper Boundary microseconds (µs) (Limit %)
writer_encode/batches/1	📈 view plot 🚷 view threshold	7.36 µs (+0.20%) Baseline: 7.34 µs	9.18 µs (80.16%)
writer_encode/batches/10	📈 view plot 🚷 view threshold	73.71 µs (-0.45%) Baseline: 74.04 µs	92.55 µs (79.64%)
writer_encode/batches/100	📈 view plot 🚷 view threshold	755.44 µs (+1.20%) Baseline: 746.49 µs	933.11 µs (80.96%)

🐰 View full continuous benchmarking report in Bencher

github-actions · 2026-04-13T18:51:36Z

Bencher Report

Branch	telemetry-core
Testbed	ubuntu-latest

Click to view all benchmark results

Benchmark	Latency	Benchmark Result microseconds (µs) (Result Δ%)	Upper Boundary microseconds (µs) (Limit %)	Throughput	Benchmark Result operations / second (ops/s) x 1e3 (Result Δ%)	Lower Boundary operations / second (ops/s) x 1e3 (Limit %)
overhead::baseline::mean_lat_ns	📈 view plot 🚷 view threshold	399.88 µs (+4.75%) Baseline: 381.75 µs	477.19 µs (83.80%)
overhead::baseline::p99_9_lat_ns	📈 view plot 🚷 view threshold	1,711.10 µs (+3.10%) Baseline: 1,659.65 µs	2,074.56 µs (82.48%)
overhead::baseline::p99_lat_ns	📈 view plot 🚷 view threshold	782.34 µs (+4.77%) Baseline: 746.75 µs	933.44 µs (83.81%)
overhead::baseline::throughput_rps				📈 view plot 🚷 view threshold	124.99 ops/s x 1e3 (-6.14%) Baseline: 133.16 ops/s x 1e3	99.87 ops/s x 1e3 (79.90%)
overhead::noop::mean_lat_ns	📈 view plot 🚷 view threshold	360.26 µs (+2.31%) Baseline: 352.14 µs	440.17 µs (81.85%)
overhead::noop::p99_9_lat_ns	📈 view plot 🚷 view threshold	977.92 µs (-2.23%) Baseline: 1,000.19 µs	1,250.24 µs (78.22%)
overhead::noop::p99_lat_ns	📈 view plot 🚷 view threshold	679.93 µs (+2.78%) Baseline: 661.57 µs	826.96 µs (82.22%)
overhead::noop::throughput_rps				📈 view plot 🚷 view threshold	138.75 ops/s x 1e3 (-3.06%) Baseline: 143.13 ops/s x 1e3	107.34 ops/s x 1e3 (77.37%)
overhead::telemetry::mean_lat_ns	📈 view plot 🚷 view threshold	363.00 µs (+3.77%) Baseline: 349.79 µs	437.24 µs (83.02%)
overhead::telemetry::p99_9_lat_ns	📈 view plot 🚷 view threshold	1,015.29 µs (-4.39%) Baseline: 1,061.95 µs	1,327.44 µs (76.49%)
overhead::telemetry::p99_lat_ns	📈 view plot 🚷 view threshold	697.34 µs (+3.33%) Baseline: 674.88 µs	843.60 µs (82.66%)
overhead::telemetry::throughput_rps				📈 view plot 🚷 view threshold	137.69 ops/s x 1e3 (-4.40%) Baseline: 144.03 ops/s x 1e3	108.02 ops/s x 1e3 (78.45%)
overhead::telemetry_p99_added_latency_ns	📈 view plot 🚷 view threshold	18,446,744,073,709,464.00 µs (-0.00%) Baseline: 18,446,744,073,709,480.00 µs	23,058,430,092,136,848.00 µs (80.00%)

🐰 View full continuous benchmarking report in Bencher

… telemetry-core

jlizen

Just some nits

Side note: telemetry_p99_added_latency_ns shows 18,446,744,073,709,516.00 us, which probably is due to underflow?

jlizen · 2026-04-13T20:20:42Z

+            // Write final metadata before sealing so single-segment
+            // traces contain runtime→worker mappings.
+            if let Err(e) = event_writer.write_current_segment_metadata() {
+                tracing::warn!("failed to write final segment metadata: {e}");


we don't have pipeline metrics yet?

we do, good call, we can add a metric here

jlizen · 2026-04-13T20:21:43Z

+        let start_mono_ns = crate::telemetry::events::clock_monotonic_ns();
+        let shared = Arc::new(SharedState::new(start_mono_ns));
+        #[allow(unused_mut)]
+        let mut event_writer = EventWriter::new(Box::new(writer));


A bit unfortunate that we accept 'static TraceWriter, but immediately box it. But, the alternatives I see are worse for users, so probably fine.

its static so that we can box it

jlizen · 2026-04-13T20:22:28Z

+    let base = shared
+        .next_worker_id
+        .fetch_add(num_workers, Ordering::Relaxed);
+    ctx.metrics_and_base.set((metrics, base)).ok();


This is pre-existing, but should we warn on a double-set?

jlizen · 2026-04-13T20:23:55Z

+// Both runtimes share a single trace file with unique worker IDs.
+// The trace viewer groups workers by runtime name.
+// Use main_handle.spawn() / io_handle.spawn() for wake-tracked futures.
+# Ok(())


nit: should we mention runtime dropping / graceful shutdown handling?

… on double-set, shutdown docs - Add write_metadata_failed and finalize_failed booleans to FlushMetrics, emitted on the final flush so failures are observable via metrics - Restructure exit path: metadata write and finalize happen before the metric is emitted; metric is always emitted on exit - Replace silent .ok() on metrics_and_base.set() with tracing::warn on double-set - Add shutdown guidance to README Multiple runtimes section

Create the FlushMetrics guard up front via append_on_drop, then mutate write_metadata_failed/finalize_failed through DerefMut on the exit path. The guard drops naturally, emitting the final metric entry.

Make FlushStats a #[metrics(subfield)] and flatten it into FlushMetrics, removing the duplicated event_count/dropped_batches/cpu_flush_duration fields. Resolves the TODO on FlushStats.

rcoh added 5 commits April 13, 2026 15:34

example: update thread_per_core to use TelemetryCore API

6b9e86c

Demonstrates the decoupled pattern: create telemetry session first, then attach coordinator + per-core runtimes via trace_runtime().

example: update multi_runtime to use TelemetryCore API, add README se…

0131d3f

…ction - multi_runtime example now uses TelemetryCore::builder() + trace_runtime() - Added "Multiple runtimes" section to README with TelemetryCore example

cleanup: TelemetryCore uses #[bon] on new(), rename build_with_reuse

72345ba

- Switch TelemetryCore from `#[builder(finish_fn = build)] fn builder()` to `#[builder] fn new()` — cleaner, same call site - Rename `build_with_reuse` → `build_and_attach_to_telemetry` for clarity

Correctly expose builders

16c4034

rcoh requested a review from jlizen April 13, 2026 18:48

rcoh marked this pull request as ready for review April 13, 2026 18:49

rcoh added 2 commits April 13, 2026 18:56

bring back comments

0881481

Merge branch 'main' of github.com:dial9-rs/dial9-tokio-telemetry into…

43b69e2

… telemetry-core

rcoh force-pushed the telemetry-core branch from 4b9beaa to 43b69e2 Compare April 13, 2026 18:56

jlizen approved these changes Apr 13, 2026

View reviewed changes

rcoh added 3 commits April 14, 2026 00:05

refactor: use append_on_drop guard with mutation for flush metrics

4ca641e

Create the FlushMetrics guard up front via append_on_drop, then mutate write_metadata_failed/finalize_failed through DerefMut on the exit path. The guard drops naturally, emitting the final metric entry.

refactor: flatten FlushStats into FlushMetrics via #[metrics(flatten)]

c0d42a3

Make FlushStats a #[metrics(subfield)] and flatten it into FlushMetrics, removing the duplicated event_count/dropped_batches/cpu_flush_duration fields. Resolves the TODO on FlushStats.

rcoh enabled auto-merge April 14, 2026 00:20

rcoh added this pull request to the merge queue Apr 14, 2026

Merged via the queue into main with commit 6b9a21e Apr 14, 2026
18 checks passed

rcoh mentioned this pull request Apr 14, 2026

rework multi-runtime API #159

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract "TelemetryCore" separate from TracedRuntime#193

Extract "TelemetryCore" separate from TracedRuntime#193
rcoh merged 10 commits into
mainfrom
telemetry-core

rcoh commented Apr 13, 2026

Uh oh!

github-actions Bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

jlizen left a comment

Uh oh!

jlizen Apr 13, 2026

Uh oh!

rcoh Apr 14, 2026

Uh oh!

jlizen Apr 13, 2026

Uh oh!

rcoh Apr 13, 2026

Uh oh!

jlizen Apr 13, 2026

Uh oh!

jlizen Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rcoh commented Apr 13, 2026

Uh oh!

github-actions Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jlizen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Apr 13, 2026 •

edited

Loading

github-actions Bot commented Apr 13, 2026 •

edited

Loading