Skip to content

Extract "TelemetryCore" separate from TracedRuntime#193

Merged
rcoh merged 10 commits into
mainfrom
telemetry-core
Apr 14, 2026
Merged

Extract "TelemetryCore" separate from TracedRuntime#193
rcoh merged 10 commits into
mainfrom
telemetry-core

Conversation

@rcoh
Copy link
Copy Markdown
Contributor

@rcoh rcoh commented Apr 13, 2026

Extract TelemetryCore so that you can spawn Dial9 separately from spawning (potentially multiple) Tokio runtimes.

rcoh added 5 commits April 13, 2026 15:34
…runtime

Introduces `TelemetryCore::builder().writer(w).build()` which produces a
`TelemetryGuard` without building any tokio runtime. Runtimes are then
attached via `guard.trace_runtime("name", builder)`.

- New `TelemetryCore` struct with `#[bon::builder]` for session config
- New `TelemetryGuard::trace_runtime()` method
- `TracedRuntime::builder()` reimplemented on top of TelemetryCore
- `build_with_reuse()` delegates to shared `attach_runtime()` helper
- Flush loop extracted to `run_flush_loop()` free function
- `SharedState` now stores `task_tracking_enabled` for use by `trace_runtime`

All 167 existing tests pass unchanged.
Demonstrates the decoupled pattern: create telemetry session first,
then attach coordinator + per-core runtimes via trace_runtime().
…ction

- multi_runtime example now uses TelemetryCore::builder() + trace_runtime()
- Added "Multiple runtimes" section to README with TelemetryCore example
- Switch TelemetryCore from `#[builder(finish_fn = build)] fn builder()`
  to `#[builder] fn new()` — cleaner, same call site
- Rename `build_with_reuse` → `build_and_attach_to_telemetry` for clarity
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 13, 2026

🐰 Bencher Report

Branchtelemetry-core
Testbedubuntu-latest
Click to view all benchmark results
BenchmarkLatencyBenchmark Result
microseconds (µs)
(Result Δ%)
Upper Boundary
microseconds (µs)
(Limit %)
writer_encode/batches/1📈 view plot
🚷 view threshold
7.36 µs
(+0.20%)Baseline: 7.34 µs
9.18 µs
(80.16%)
writer_encode/batches/10📈 view plot
🚷 view threshold
73.71 µs
(-0.45%)Baseline: 74.04 µs
92.55 µs
(79.64%)
writer_encode/batches/100📈 view plot
🚷 view threshold
755.44 µs
(+1.20%)Baseline: 746.49 µs
933.11 µs
(80.96%)
🐰 View full continuous benchmarking report in Bencher

@rcoh rcoh requested a review from jlizen April 13, 2026 18:48
@rcoh rcoh marked this pull request as ready for review April 13, 2026 18:49
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 13, 2026

🐰 Bencher Report

Branchtelemetry-core
Testbedubuntu-latest
Click to view all benchmark results
BenchmarkLatencyBenchmark Result
microseconds (µs)
(Result Δ%)
Upper Boundary
microseconds (µs)
(Limit %)
ThroughputBenchmark Result
operations / second (ops/s) x 1e3
(Result Δ%)
Lower Boundary
operations / second (ops/s) x 1e3
(Limit %)
overhead::baseline::mean_lat_ns📈 view plot
🚷 view threshold
399.88 µs
(+4.75%)Baseline: 381.75 µs
477.19 µs
(83.80%)
overhead::baseline::p99_9_lat_ns📈 view plot
🚷 view threshold
1,711.10 µs
(+3.10%)Baseline: 1,659.65 µs
2,074.56 µs
(82.48%)
overhead::baseline::p99_lat_ns📈 view plot
🚷 view threshold
782.34 µs
(+4.77%)Baseline: 746.75 µs
933.44 µs
(83.81%)
overhead::baseline::throughput_rps📈 view plot
🚷 view threshold
124.99 ops/s x 1e3
(-6.14%)Baseline: 133.16 ops/s x 1e3
99.87 ops/s x 1e3
(79.90%)
overhead::noop::mean_lat_ns📈 view plot
🚷 view threshold
360.26 µs
(+2.31%)Baseline: 352.14 µs
440.17 µs
(81.85%)
overhead::noop::p99_9_lat_ns📈 view plot
🚷 view threshold
977.92 µs
(-2.23%)Baseline: 1,000.19 µs
1,250.24 µs
(78.22%)
overhead::noop::p99_lat_ns📈 view plot
🚷 view threshold
679.93 µs
(+2.78%)Baseline: 661.57 µs
826.96 µs
(82.22%)
overhead::noop::throughput_rps📈 view plot
🚷 view threshold
138.75 ops/s x 1e3
(-3.06%)Baseline: 143.13 ops/s x 1e3
107.34 ops/s x 1e3
(77.37%)
overhead::telemetry::mean_lat_ns📈 view plot
🚷 view threshold
363.00 µs
(+3.77%)Baseline: 349.79 µs
437.24 µs
(83.02%)
overhead::telemetry::p99_9_lat_ns📈 view plot
🚷 view threshold
1,015.29 µs
(-4.39%)Baseline: 1,061.95 µs
1,327.44 µs
(76.49%)
overhead::telemetry::p99_lat_ns📈 view plot
🚷 view threshold
697.34 µs
(+3.33%)Baseline: 674.88 µs
843.60 µs
(82.66%)
overhead::telemetry::throughput_rps📈 view plot
🚷 view threshold
137.69 ops/s x 1e3
(-4.40%)Baseline: 144.03 ops/s x 1e3
108.02 ops/s x 1e3
(78.45%)
overhead::telemetry_p99_added_latency_ns📈 view plot
🚷 view threshold
18,446,744,073,709,464.00 µs
(-0.00%)Baseline: 18,446,744,073,709,480.00 µs
23,058,430,092,136,848.00 µs
(80.00%)
🐰 View full continuous benchmarking report in Bencher

Copy link
Copy Markdown
Member

@jlizen jlizen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nits

Side note: telemetry_p99_added_latency_ns shows 18,446,744,073,709,516.00 us, which probably is due to underflow?

// Write final metadata before sealing so single-segment
// traces contain runtime→worker mappings.
if let Err(e) = event_writer.write_current_segment_metadata() {
tracing::warn!("failed to write final segment metadata: {e}");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have pipeline metrics yet?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do, good call, we can add a metric here

let start_mono_ns = crate::telemetry::events::clock_monotonic_ns();
let shared = Arc::new(SharedState::new(start_mono_ns));
#[allow(unused_mut)]
let mut event_writer = EventWriter::new(Box::new(writer));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit unfortunate that we accept 'static TraceWriter, but immediately box it. But, the alternatives I see are worse for users, so probably fine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its static so that we can box it

let base = shared
.next_worker_id
.fetch_add(num_workers, Ordering::Relaxed);
ctx.metrics_and_base.set((metrics, base)).ok();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pre-existing, but should we warn on a double-set?

// Both runtimes share a single trace file with unique worker IDs.
// The trace viewer groups workers by runtime name.
// Use main_handle.spawn() / io_handle.spawn() for wake-tracked futures.
# Ok(())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we mention runtime dropping / graceful shutdown handling?

rcoh added 3 commits April 14, 2026 00:05
… on double-set, shutdown docs

- Add write_metadata_failed and finalize_failed booleans to FlushMetrics,
  emitted on the final flush so failures are observable via metrics
- Restructure exit path: metadata write and finalize happen before the
  metric is emitted; metric is always emitted on exit
- Replace silent .ok() on metrics_and_base.set() with tracing::warn on
  double-set
- Add shutdown guidance to README Multiple runtimes section
Create the FlushMetrics guard up front via append_on_drop, then mutate
write_metadata_failed/finalize_failed through DerefMut on the exit path.
The guard drops naturally, emitting the final metric entry.
Make FlushStats a #[metrics(subfield)] and flatten it into FlushMetrics,
removing the duplicated event_count/dropped_batches/cpu_flush_duration
fields. Resolves the TODO on FlushStats.
@rcoh rcoh enabled auto-merge April 14, 2026 00:20
@rcoh rcoh added this pull request to the merge queue Apr 14, 2026
Merged via the queue into main with commit 6b9a21e Apr 14, 2026
18 checks passed
@rcoh rcoh mentioned this pull request Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants