[IPC Protocol] Add specification to configure a user_events eventpipe session #5454

mdh1418 · 2025-04-21T23:53:20Z

In .NET 10, the EventPipe infrastructure will be leveraged to support user_events.

This PR documents the protocol for enabling a user_events-based EventPipe Session through the Diagnostics IPC protocol, where a new EventPipe Command ID CollectTracing5 will accept necessary tracepoint configuration.

As the user_events EventPipe session is not streaming based, the payload is expected to first encode a uint output_format to denote the session format (streaming vs user_events). Afterwards, only relevant session configuration options are to be encoded, outlined at the top of the EventPipe Commands section in the Streaming Session section, User_events Session section, and Session Providers section.

For user_events EventPipe sessions, an additional tracepoint_config is to be encoded, to map Event IDs to tracepoints.
This protocol expects the Client to have access to the user_events_data file in order to enable configuring a user_event EventPipe session, and expects the SCM_RIGHTS to the user_events_data file descriptor to be sent in the continuation stream.

Additionally, as user_events does not support callbacks, a new event_filter config is expected to be encoded in CollectTracing5, to act as an allow/deny list of Event IDs that will apply post keyword/level filter.

documentation/design-docs/ipc-protocol.md

Update CommandSets Command IDs Move EventPipe StopTracing to beginning Fix sample payload serialization Clarify Header Size and NetTrace format version Clarify that filter_data can be 0 length to avoid confusion with optional meaning that encoding can be skipped

documentation/design-docs/ipc-protocol.md

brianrob

@mdh1418 thanks for updating this doc and for your patience with me reviewing it. A couple of questions, but otherwise, looks good.

documentation/design-docs/ipc-protocol.md

lateralusX · 2025-05-23T09:17:56Z

documentation/design-docs/ipc-protocol.md

+
+The `version` is the version of the tracepoint format, which in this case is [version 1](#tracepoint-format-v1).
+
+The `event_id` is the ID of the event, defined by the EventSource/native manifest.


Events are also versioned, so just having the event id won't be enough to know how to parse the payload. In EventPipe and nettrace format each serialized event has a metadata id associated that is unique to the session stream and will appear before the event in the stream. Inside the metadata record it will carry additional data about the event, like provider name and lots of metadata about the event, including version. How will this work for user events? I see that we have a metadata field, and looking at the description below, it sounds like we will include metadata for the first event of that type per session, similar to the nettrace metadata event. But since each event instance doesn't carry a unique metadata id, how will we lookup metadata for potential different versions of an event?

The metadata should be sent once per-event, per-EventPipe-session. (Process id,tracepoint) should be sufficient to resolve a session and (tracepoint,event-id) is sufficient to resolve an event. The combination of process id + event id + tracepoint name should then have equivalent precision as the metadata id used in the nettrace format. If we ever support multiple EventSources with the same name, same event id, but different parameter signatures in the same process it would cause problems and we'd have to address it. No plans to do that though.

ok, so that means we would need the tracepoint to be different for different providers, since event id are only unique within a specific provider, mixing events from different providers into the same tracepoint will cause event id collisions. How will we handle different versions of the same event?

How will we handle different versions of the same event?

In the same process we don't support different versions of the same event. In separate processes you have the PID to distinguish and each process will emit its own metadata.

ok, so that means we would need the tracepoint to be different for different providers, since event id are only unique within a specific provider, mixing events from different providers into the same tracepoint will cause event id collisions

yep! #5454 (comment)

How will we handle different versions of the same event?

In the same process we don't support different versions of the same event. In separate processes you have the PID to distinguish and each process will emit its own metadata.

OK, events over EventPipe can technically handle this since each unique event instance is provider+event_id+event_version. Do we enforce this in anyway or is it just a convention that we should only support one version of the same event in the same process? Making sure that a specific event fired by runtime uses the same version is probably controllable, but since we generate all versions in clretwallmain.h, it is still possible to use both V1 and V2 for the same event in different parts of the runtime. For EventSource its probably more complicated to enforce since we have less control of usage.

Are there events where multiple of its versions are actually in use at the same time? I've only just started looking through.

There might be multiple ways to discern the EventPipe Event's version. One would be through the EventPipe Event Metadata, but I'm trying to validate that the metadata will always encode the EventPipeEvent's version. One oddity (at least at a glance it looks like an inconsistency) is https://github.com/dotnet/runtime/blob/9703a2ed912485bdb888ac3e3a023564ec9e3528/src/native/eventpipe/ep-event-source.c#L132-L154, where the metadata encodes version 1 but the actual event being added has version 0

Another way to discern the EventPipe Event's version is IFF different versions of the same event have a unique size. Since the size is being encoded in the __rel_loc, that can be used to reverse engineer which version of the event is being seen. I don't know if we have that invariant, that newer versions of the same event are strictly larger than previous versions. I'd have to investigate

Do we enforce this in anyway or is it just a convention that we should only support one version of the same event in the same process?

EventSource enforces it and it is documented:

"Each method has a body that calls WriteEvent passing it an ID (a numeric value that represents the event) and the arguments of the event method. The ID needs to be unique within the EventSource. The ID is explicitly assigned using the System.Diagnostics.Tracing.EventAttribute"

If you try to define an EventSource with multiple events whose IDs match (in order to give them different versions) then internally EventSource will have an error like this:

Exception thrown: 'System.ArgumentException' in System.Private.CoreLib.dll Exception thrown: 'System.ArgumentException' in System.Private.CoreLib.dll EventSource Error: ERROR: Exception in Command Processing for EventSource MyEventSource: Event Foo has ID 1 which is already in use.

That exception gets caught internally to avoid having instrumentation break user code, but it should prevent the event from working.

Likewise if you try to define multiple EventSources with the same provider ID in order to define their events differently, that also hits a different error check.

OK, good, that should take care of only emitting one version of an event in managed code. I don't believe we enforce this rule for native runtime events so then it will become a convention to not emit multiple versions of the same event through runtime code. Maybe we could implement some validation code to enforce this.

So based on above discussion, in a given user_event trace, in order for parsers to correctly parse events the following needs to hold:

All events from a given provider must go into its own tracepoint to make sure event ids are unique. Providers can't be mixed in a single tracepoint since there will be event id collisions.

First instance of an event inside a tracepoint needs to carry metadata for the event id.

Only one version of a given event can only appear inside the same tracepoint.

If multiple processes write into the same tracepoint, parsers must parse events based on pid.

There might be multiple ways to discern the EventPipe Event's version. One would be through the EventPipe Event Metadata, but I'm trying to validate that the metadata will always encode the EventPipeEvent's version. One oddity (at least at a glance it looks like an inconsistency) is https://github.com/dotnet/runtime/blob/9703a2ed912485bdb888ac3e3a023564ec9e3528/src/native/eventpipe/ep-event-source.c#L132-L154, where the metadata encodes version 1 but the actual event being added has version 0

@mdh1418 that represent the process info event emitted once per session. We should probably correct that inconsistency. Looking at perfview implementation it doesn't look like it does any structured parsing of this event https://github.com/brianrob/perfview/blob/a1c8b66ac9358dc4a7bbae41825e917c130f7ca3/src/TraceEvent/TraceLog.cs#L2091. Since we already introduced version 1 in the metadata we should move up the event definition to version 1 as well.

lateralusX · 2025-05-28T10:31:57Z

documentation/design-docs/ipc-protocol.md

+#### User_events Session Payload:
+* `uint output_format`: 1
+* `ulong rundownKeyword`: Indicates the keyword for the rundown provider
+* `array<user_events_provider_config> providers`: The providers to turn on for the session


I see that we drop the requestStackWalk bool for user events and I guess its intentional. Reading up on other discussions around stack traces I'm under the impression that we will rely on OS stack walking capabilities for user events, correct?

If that is the case then I assume we won't trigger the codepath in ep_buffer_manager_write_event where we check if session have the stack walks enabled (based on the requestStackWalk) and if the event requests stack walks (part of the event configuration) or make sure ep_session_get_enable_stackwalk (session) is always false for user event sessions.

There is one exception to this, the sample profiler provider, that provider does stack walks as part of sampling and pass stack directly to ep_write_sample_profile_event. Will we support the sample profiler for user events, or will we use some other mechanism provided by OS to sample threads?

For a user_event eventpipe session, I think the aim is to not use any ep_buffer_manager* logic. Right now in the runtime PR draft, the stackwalk_requested is forcefully set to false if its a user_events session.

Will we support the sample profiler for user events, or will we use some other mechanism provided by OS to sample threads?

That was one of the things we were discussing offline, where I need to thoroughly investigate what is needed to make stackwalking possible for user_events, but I haven't gotten to that yet.

OK, one thing to keep an eye on is what requirements the OS unwinder puts on frames. AFAIK CoreCLR JIT doesn't provide native platform specific unwind information (DWARF) that gets registered with OS unwinder, this is only done when using NativeAOT (as part of the binary). CoreCLR JIT uses Windows unwind op codes on all platforms and rely on internal unwinder to correctly unwind stacks including managed frames. If this observation still holds, then OS unwinder will only be able to stackwalk managed frames if they are compiled with frame pointers, something the JIT optimizer can omit if not instructed to disable that optimization.

documentation/design-docs/ipc-protocol.md

Parallel renaming to Runtime counterpart PR. The serialization format and output format were deemed confusing to have side by side, so renamed the `output format` to more clearly represent its usage

mdh1418 · 2025-06-04T16:30:45Z

@lateralusX @noahfalk @beaubelgrave @brianrob @AaronRobinsonMSFT
If I'm not mistaken, all blocking concerns have been addressed, and right now the unresolved comments are discussions for understanding/clarity that can be addressed in a separate PR?

The runtime counterpart is nearly mergeable, and I haven't discovered any additional critical modifications to the spec for the initial iteration. I'm planning to merge this and the runtime PR soon (end of this week or early next week) and then more incremental follow-up PRs will be made to address additional nice-to-haves + clarity.

mdh1418 requested a review from a team as a code owner April 21, 2025 23:53

mdh1418 force-pushed the add_ipc_message_protocol_for_user_events_ep_session branch from 722afa6 to 4094fe9 Compare April 21, 2025 23:54

noahfalk reviewed Apr 22, 2025

View reviewed changes

beaubelgrave reviewed Apr 23, 2025

View reviewed changes

documentation/design-docs/ipc-protocol.md Outdated Show resolved Hide resolved

beaubelgrave reviewed Apr 23, 2025

View reviewed changes

documentation/design-docs/ipc-protocol.md Show resolved Hide resolved

beaubelgrave reviewed Apr 23, 2025

View reviewed changes

documentation/design-docs/ipc-protocol.md Outdated Show resolved Hide resolved

noahfalk reviewed Apr 23, 2025

View reviewed changes

documentation/design-docs/ipc-protocol.md Outdated Show resolved Hide resolved

documentation/design-docs/ipc-protocol.md Outdated Show resolved Hide resolved

documentation/design-docs/ipc-protocol.md Outdated Show resolved Hide resolved

mdh1418 added 2 commits April 25, 2025 12:19

[Docs][IPC Protocol] Add CollectTracing5

8b5bf46

mdh1418 force-pushed the add_ipc_message_protocol_for_user_events_ep_session branch from c30b773 to 8b5bf46 Compare April 25, 2025 16:26

noahfalk reviewed Apr 25, 2025

View reviewed changes

documentation/design-docs/ipc-protocol.md Outdated Show resolved Hide resolved

noahfalk reviewed Apr 25, 2025

View reviewed changes

documentation/design-docs/ipc-protocol.md Outdated Show resolved Hide resolved

noahfalk reviewed Apr 25, 2025

View reviewed changes

documentation/design-docs/ipc-protocol.md Outdated Show resolved Hide resolved

mdh1418 added 2 commits April 28, 2025 11:05

Confine CollectTracing5 details to its section

efe387a

[Docs][IPC Protocol] Detail user_events format

87902f0

mdh1418 force-pushed the add_ipc_message_protocol_for_user_events_ep_session branch from 1ba9499 to 87902f0 Compare May 1, 2025 00:56

beaubelgrave reviewed May 1, 2025

View reviewed changes

documentation/design-docs/ipc-protocol.md Show resolved Hide resolved

mdh1418 mentioned this pull request May 3, 2025

[Linux][EventPipe][UserEvents] Add user events eventpipe support dotnet/runtime#115265

Merged

Add User_events format string

870e337

mdh1418 requested a review from brianrob May 6, 2025 14:46

beaubelgrave reviewed May 6, 2025

View reviewed changes

documentation/design-docs/ipc-protocol.md Outdated Show resolved Hide resolved

mdh1418 requested a review from agocke May 6, 2025 20:51

mdh1418 added 3 commits May 7, 2025 11:36

Encode user_events payload as u8

55f2877

Update tracepoint format

c2cb3db

Update event_filter example and Add bool encoding

2b9bce0

brianrob reviewed May 15, 2025

View reviewed changes

documentation/design-docs/ipc-protocol.md Outdated Show resolved Hide resolved

documentation/design-docs/ipc-protocol.md Show resolved Hide resolved

steveisok requested a review from lateralusX May 21, 2025 23:47

lateralusX reviewed May 22, 2025

View reviewed changes

documentation/design-docs/ipc-protocol.md Outdated Show resolved Hide resolved

lateralusX reviewed May 22, 2025

View reviewed changes

documentation/design-docs/ipc-protocol.md Outdated Show resolved Hide resolved

lateralusX reviewed May 22, 2025

View reviewed changes

documentation/design-docs/ipc-protocol.md Outdated Show resolved Hide resolved

lateralusX reviewed May 22, 2025

View reviewed changes

documentation/design-docs/ipc-protocol.md Show resolved Hide resolved

lateralusX reviewed May 22, 2025

View reviewed changes

documentation/design-docs/ipc-protocol.md Outdated Show resolved Hide resolved

lateralusX reviewed May 23, 2025

View reviewed changes

mdh1418 added 3 commits May 27, 2025 12:34

Clarify streaming specific fields

d5d2222

Rename and clarify field

c9e8f96

Rename for consistency

3f310de

lateralusX reviewed May 28, 2025

View reviewed changes

steveisok requested a review from AaronRobinsonMSFT June 2, 2025 12:44

AaronRobinsonMSFT reviewed Jun 3, 2025

View reviewed changes

mdh1418 added 2 commits June 4, 2025 12:16

Add bool to general payload encoding spec and clarify description

eda74d6

Rename output_format to session_type

ee73c08

Parallel renaming to Runtime counterpart PR. The serialization format and output format were deemed confusing to have side by side, so renamed the `output format` to more clearly represent its usage

Move metadata into extensions and document format

432f6b8

mdh1418 requested review from lateralusX and noahfalk June 18, 2025 15:13

noahfalk approved these changes Jun 18, 2025

View reviewed changes

brianrob approved these changes Jun 18, 2025

View reviewed changes

noahfalk merged commit ff0733e into dotnet:main Jun 18, 2025
2 checks passed

mdh1418 deleted the add_ipc_message_protocol_for_user_events_ep_session branch June 18, 2025 21:54


		The `version` is the version of the tracepoint format, which in this case is [version 1](#tracepoint-format-v1).

		The `event_id` is the ID of the event, defined by the EventSource/native manifest.

[IPC Protocol] Add specification to configure a user_events eventpipe session #5454

[IPC Protocol] Add specification to configure a user_events eventpipe session #5454

Uh oh!

Conversation

mdh1418 commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brianrob left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lateralusX May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lateralusX May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mdh1418 commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

mdh1418 commented Apr 21, 2025 •

edited

Loading

lateralusX May 23, 2025 •

edited

Loading

lateralusX May 28, 2025 •

edited

Loading