feat!: Include CPU id in CPU profile samples#338
Merged
Conversation
perf-self-profile (breaking change): - Sample.cpu is now Option<u32> rather than u32. Perf sampling always returns Some(cpu) via PERF_SAMPLE_CPU; the ctimer fallback returns None when SYS_getcpu fails instead of silently reporting cpu 0. - SampleData/DrainedSample/SlotWriter::write in the lock-free ring buffer thread Option<u32> through the signal-handler-safe path. dial9-tokio-telemetry: - CpuSampleData.cpu: Option<u32> (in-memory). - CpuSampleEvent.cpu: Option<u64> on the wire. Widened to u64 so the field encodes as OptionalVarint: 1 byte when absent, 2 bytes total for typical small CPU ids. Narrowed back to Option<u32> on decode. - TelemetryEvent::CpuSample gains cpu: Option<u32>. Older traces without the field decode as None (forward-compatible). - RawCpuSample.cpu threaded through CpuProfiler::drain and SchedProfiler::drain. JS parser: CpuSample objects now expose cpu: number|null. Tests: - perf-self-profile: multithread.rs asserts every perf sample carries Some(cpu) (Linux-only). - dial9-tokio-telemetry: two round-trip tests in buffer.rs exercising encode+decode with cpu=Some(7) and cpu=None through ThreadLocalBuffer. Drive-by: narrow cfg(test) on SchedStat.fd to cfg(all(test, target_os = "linux")) to match its only reader, fixing a dead_code warning on non-Linux builds. Demo trace not yet regenerated (pending Linux environment).
Verified 7411 / 7411 CpuSample events carry a cpu id (10 distinct CPU ids observed: 0-9). Generated via scripts/regenerate_demo_trace_docker.sh on the aarch64 docker builder.
jlizen
approved these changes
May 2, 2026
Member
jlizen
left a comment
There was a problem hiding this comment.
++ to the sentinel -> Option, good breaking change
| pub callchain: InternedStackFrames, | ||
| /// CPU the sample was taken on, if the backend could determine it. | ||
| /// | ||
| /// Widened to `u64` on the wire so the field encodes as `OptionalVarint`: |
| .expect("stack pool entry must exist for CpuSample callchain") | ||
| .to_vec(), | ||
| // CPU id is varint-encoded as u64 on the wire; real CPU ids fit in u32. | ||
| cpu: e.cpu.map(|v| v as u32), |
Member
There was a problem hiding this comment.
nit: better to be clearer about bad wire with u32::try_from()::ok
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the CPU id that a CPU profile sample was collected on to
CpuSampleEvent, so the viewer can show which physical CPU each sample came from.Breaking changes
This is a breaking change to
perf-self-profilewhich will be bumped to 0.4 in the next releaseChanges
perf-self-profile(breaking):Sample.cpu: u32→Sample.cpu: Option<u32>. The perf backend already getsOption<u32>from the kernel viaPERF_SAMPLE_CPU; previously we wereunwrap_or(0)-ing away the missing case. The ctimer fallback now returnsNonewhenSYS_getcpufails instead of silently reporting CPU 0 (ambiguous with real CPU 0).SampleData/DrainedSample/SlotWriterthreadOption<u32>through the lock-free signal-handler-safe ring buffer.dial9-tokio-telemetry:CpuSampleData.cpu: Option<u32>in memory.CpuSampleEvent.cpu: Option<u64>on the wire — widened tou64so it encodes asOptionalVarint: 1 byte when absent, 2 bytes total for typical small CPU ids. Narrowed back toOption<u32>on decode.TelemetryEvent::CpuSamplegainscpu: Option<u32>. Older traces recorded before this field existed decode cleanly ascpu: None.RawCpuSample.cputhreaded throughCpuProfiler::drainandSchedProfiler::drain.JS parser:
CpuSampleobjects now exposecpu: number | null.Tests
perf-self-profile/tests/multithread.rs: asserts every perf sample carriesSome(cpu)(Linux-only).dial9-tokio-telemetry/src/telemetry/buffer.rs: two round-trip tests forCpuSample— one withcpu: Some(7), one withcpu: None— encode throughThreadLocalBufferand decode viadecode_events.All 422 tests pass locally, 2 stress iterations clean.
cargo fmt --checkandcargo clippy --all-targets --all-featuresclean.Drive-by
Narrow
cfg(test)onSchedStat.fdtocfg(all(test, target_os = "linux"))to match its only reader, fixing a pre-existingdead_codewarning on non-Linux builds.Follow-ups
dial9-viewer/ui/demo-trace.bin) not yet regenerated — will do in a follow-up commit on this branch before merge.WorkerUnparkso we can see scheduler migrations between park and unpark.