Add dial9-in-prod example by rcoh · Pull Request #335 · dial9-rs/dial9

rcoh · 2026-04-30T19:57:24Z

I think we might hoist this into the readme at some point? for now, its linked in the readme

fixes #310

The previous commit renamed the README heading `## Wake event tracking` to `## Wake event tracking (opt-in)` (to match the sibling `## Tracing span events (opt-in)`). `dial9-viewer/build.rs` matches section headings exactly via `SETUP_SECTIONS` and panics loudly if one is renamed, which broke clippy, ubuntu-nightly build, docs.rs check, and cargo package CI jobs. Update the constant to the new heading.

- Fix env var table: DIAL9_ENABLED only accepts true/false (not 1/yes) since parse_env uses bool::from_str - Fix README typo: 'read to be polled' → 'ready to be polled' - Fix doc comment typo: 'an even' → 'an event' - Fix duplicate 'a' across line break in doc comment

jlizen

Content seems fine overall, though might be worth waiting on #256 landing since that opens up some nicer "bad startup" handling. Seems like it's pretty close. No worries if you'd rather get this out tomorrow, I believe @Fluzko is AFK until Monday.

But, I do think we should probably avoid the handrolled env parsing in favor of using Clap. Feel free to disagree, in which case I can stamp this as-is (and up to you about the other small content tweaks).

jlizen · 2026-05-01T02:36:40Z

+//!   install the runtime hooks but they will all be no-ops. You could then set up a background task that reads dynamic configuration to enable dial9 later. This is a much larger surface area of code
+//!   that is enabled, so it is higher risk.
+//!
+//! > Note! dial9 must be created _before_ your async runtime. dial9 relies on installing itself into the runtime telemetry hooks to produce Tokio events.


Should we mention that using #[dial9::main] gets you this for free?

jlizen · 2026-05-01T02:37:12Z

+//!
+//! ### The overhead of running dial9
+//!
+//! 1. Dial9 allocates a 1MB buffer for each thread that record events. If you are recording events from a huge number of threads, this can bloat memory.


Does this risk getting stale if we state the specific size? Should we be more vague?

jlizen · 2026-05-01T02:37:40Z

+//! Dial9 has two types of CPU profiling available:
+//! 1. CPU profiling (this is what you would normally consider CPU profiling): Dial9 is sampling stack traces that are running on the CPU.
+//! 2. Schedule Profiling: dial9 subscribes to the sched-switch linux kernel event and can capture a stack trace when your application is descheduled by the kernel. In order
+//!    to subscribe to these events, dial9 must open one perf fd per worker thread.


formatting seems off here

jlizen · 2026-05-01T02:38:31Z

+//!
+//! Dial9 has many possible components you can enable. The more components, the more data you will produce and the more overhead your application will have.
+//!
+//! #### CPU Profiling


i would have expected some mention of permissions here (the implications of the available paranoid settings needed to access these)

jlizen · 2026-05-01T02:39:22Z

+//!
+//! #### Tracing
+//! Dial9 can capture Tracing spans via the `TracingLayer`. On the scale of tracing, this is fairly low overhead, however, if you have a large amount of deeply nested spans, this can produce a huge amount
+//! of data. We recommend using a very fine-grained filter.


would it be worth including a sample filter (perhaps filtering aws sdk stuff), since we don't have the telemetry in the example?

jlizen · 2026-05-01T02:41:31Z

+
+    let base_path = format!("{}/trace.bin", opts.trace_dir.trim_end_matches('/'));
+
+    let max_file_size = (opts.max_disk_usage_bytes / 4).max(16 * 1024 * 1024);


Looking forward to getting rid of this

jlizen · 2026-05-01T02:42:26Z

+        return Dial9ConfigBuilder::disabled().build();
+    }
+
+    // Make sure the trace directory exists; the writer errors out otherwise.


This is needed just because build() will panic on failure currently?

jlizen · 2026-05-01T02:47:33Z

+    /// Parse environment variables into a [`Dial9Opts`].
+    ///
+    /// Pure — no side effects, no panics, no logging.
+    fn from_env() -> Result<Self, Dial9EnvError> {


The sheer volume of parsing code makes this example more noisy than it needs to be. In practice I would probably use Clap with its envsupport for this... Pretty sure even the linux thing can be supported inline with default_value_t, else a simple enough option + unwrap_or.

WDYT about cutting over to stay focused on the business logic and opinions?

yeah that seems reasonable. Or we can just "leave it as an exercise to the reader"

I'm thinking this would actually work nicely as a _guide module instead of an example

jlizen · 2026-05-01T20:05:14Z

@rcoh #256 is hitting main which adds the build_or_disabled() options and the nicer TelemetryHandle::current() api, ping me when you are ready for a fresh review

Resolve README.md conflicts by taking main's restructured layout and adding only the production_use example pointer paragraph.

…builder API Drop `try_current` and `parse_or_fallback` now that `build_or_disabled` returns a pass-through config on writer failures and `TelemetryHandle::current` is inert when disabled, so application code runs unchanged whether dial9 is recording or not. Fold the four cfg-gated apply_* helpers into a single `with_runtime` closure.

rcoh added 4 commits April 30, 2026 19:52

Add dial9-in-prod example

0e69b73

docs: add to readme

91d3dfa

jlizen reviewed May 1, 2026

View reviewed changes

rcoh added 2 commits May 1, 2026 15:52

Add a 'getting-useful-data' section

e050b63

Refine prod guidance

115e907

rcoh requested a review from jlizen May 1, 2026 17:59

jlizen added 2 commits May 2, 2026 00:11

Merge branch 'main' into prod-guidance

50ca169

Resolve README.md conflicts by taking main's restructured layout and adding only the production_use example pointer paragraph.

jlizen approved these changes May 2, 2026

View reviewed changes

jlizen enabled auto-merge May 2, 2026 00:29

jlizen disabled auto-merge May 2, 2026 09:33

jlizen enabled auto-merge May 2, 2026 09:33

jlizen added this pull request to the merge queue May 2, 2026

Merged via the queue into main with commit cc4374e May 2, 2026
30 of 34 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dial9-in-prod example#335

Add dial9-in-prod example#335
jlizen merged 8 commits into
mainfrom
prod-guidance

rcoh commented Apr 30, 2026 •

edited

Loading

Uh oh!

jlizen left a comment

Uh oh!

jlizen May 1, 2026

Uh oh!

jlizen May 1, 2026

Uh oh!

jlizen May 1, 2026

Uh oh!

jlizen May 1, 2026

Uh oh!

jlizen May 1, 2026

Uh oh!

jlizen May 1, 2026

Uh oh!

jlizen May 1, 2026

Uh oh!

jlizen May 1, 2026

Uh oh!

rcoh May 1, 2026

Uh oh!

jlizen commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		let base_path = format!("{}/trace.bin", opts.trace_dir.trim_end_matches('/'));

		let max_file_size = (opts.max_disk_usage_bytes / 4).max(16 * 1024 * 1024);

Conversation

rcoh commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jlizen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jlizen commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rcoh commented Apr 30, 2026 •

edited

Loading