-
Notifications
You must be signed in to change notification settings - Fork 382
[IPC Protocol] Add specification to configure a user_events eventpipe session #5454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IPC Protocol] Add specification to configure a user_events eventpipe session #5454
Conversation
722afa6
to
4094fe9
Compare
Update CommandSets Command IDs Move EventPipe StopTracing to beginning Fix sample payload serialization Clarify Header Size and NetTrace format version Clarify that filter_data can be 0 length to avoid confusion with optional meaning that encoding can be skipped
c30b773
to
8b5bf46
Compare
1ba9499
to
87902f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdh1418 thanks for updating this doc and for your patience with me reviewing it. A couple of questions, but otherwise, looks good.
|
||
The `version` is the version of the tracepoint format, which in this case is [version 1](#tracepoint-format-v1). | ||
|
||
The `event_id` is the ID of the event, defined by the EventSource/native manifest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Events are also versioned, so just having the event id won't be enough to know how to parse the payload. In EventPipe and nettrace format each serialized event has a metadata id associated that is unique to the session stream and will appear before the event in the stream. Inside the metadata record it will carry additional data about the event, like provider name and lots of metadata about the event, including version. How will this work for user events? I see that we have a metadata field, and looking at the description below, it sounds like we will include metadata for the first event of that type per session, similar to the nettrace metadata event. But since each event instance doesn't carry a unique metadata id, how will we lookup metadata for potential different versions of an event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metadata should be sent once per-event, per-EventPipe-session. (Process id,tracepoint) should be sufficient to resolve a session and (tracepoint,event-id) is sufficient to resolve an event. The combination of process id + event id + tracepoint name should then have equivalent precision as the metadata id used in the nettrace format. If we ever support multiple EventSources with the same name, same event id, but different parameter signatures in the same process it would cause problems and we'd have to address it. No plans to do that though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so that means we would need the tracepoint to be different for different providers, since event id are only unique within a specific provider, mixing events from different providers into the same tracepoint will cause event id collisions. How will we handle different versions of the same event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will we handle different versions of the same event?
In the same process we don't support different versions of the same event. In separate processes you have the PID to distinguish and each process will emit its own metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so that means we would need the tracepoint to be different for different providers, since event id are only unique within a specific provider, mixing events from different providers into the same tracepoint will cause event id collisions
yep! #5454 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will we handle different versions of the same event?
In the same process we don't support different versions of the same event. In separate processes you have the PID to distinguish and each process will emit its own metadata.
OK, events over EventPipe can technically handle this since each unique event instance is provider+event_id+event_version. Do we enforce this in anyway or is it just a convention that we should only support one version of the same event in the same process? Making sure that a specific event fired by runtime uses the same version is probably controllable, but since we generate all versions in clretwallmain.h, it is still possible to use both V1 and V2 for the same event in different parts of the runtime. For EventSource its probably more complicated to enforce since we have less control of usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there events where multiple of its versions are actually in use at the same time? I've only just started looking through.
There might be multiple ways to discern the EventPipe Event's version. One would be through the EventPipe Event Metadata, but I'm trying to validate that the metadata will always encode the EventPipeEvent's version. One oddity (at least at a glance it looks like an inconsistency) is https://github.com/dotnet/runtime/blob/9703a2ed912485bdb888ac3e3a023564ec9e3528/src/native/eventpipe/ep-event-source.c#L132-L154, where the metadata encodes version 1 but the actual event being added has version 0
Another way to discern the EventPipe Event's version is IFF different versions of the same event have a unique size. Since the size is being encoded in the __rel_loc
, that can be used to reverse engineer which version of the event is being seen. I don't know if we have that invariant, that newer versions of the same event are strictly larger than previous versions. I'd have to investigate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we enforce this in anyway or is it just a convention that we should only support one version of the same event in the same process?
EventSource enforces it and it is documented:
"Each method has a body that calls WriteEvent passing it an ID (a numeric value that represents the event) and the arguments of the event method. The ID needs to be unique within the EventSource. The ID is explicitly assigned using the System.Diagnostics.Tracing.EventAttribute"
If you try to define an EventSource with multiple events whose IDs match (in order to give them different versions) then internally EventSource will have an error like this:
Exception thrown: 'System.ArgumentException' in System.Private.CoreLib.dll
Exception thrown: 'System.ArgumentException' in System.Private.CoreLib.dll
EventSource Error: ERROR: Exception in Command Processing for EventSource MyEventSource: Event Foo has ID 1 which is already in use.
That exception gets caught internally to avoid having instrumentation break user code, but it should prevent the event from working.
Likewise if you try to define multiple EventSources with the same provider ID in order to define their events differently, that also hits a different error check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, good, that should take care of only emitting one version of an event in managed code. I don't believe we enforce this rule for native runtime events so then it will become a convention to not emit multiple versions of the same event through runtime code. Maybe we could implement some validation code to enforce this.
So based on above discussion, in a given user_event trace, in order for parsers to correctly parse events the following needs to hold:
- All events from a given provider must go into its own tracepoint to make sure event ids are unique. Providers can't be mixed in a single tracepoint since there will be event id collisions.
- First instance of an event inside a tracepoint needs to carry metadata for the event id.
- Only one version of a given event can only appear inside the same tracepoint.
- If multiple processes write into the same tracepoint, parsers must parse events based on pid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be multiple ways to discern the EventPipe Event's version. One would be through the EventPipe Event Metadata, but I'm trying to validate that the metadata will always encode the EventPipeEvent's version. One oddity (at least at a glance it looks like an inconsistency) is https://github.com/dotnet/runtime/blob/9703a2ed912485bdb888ac3e3a023564ec9e3528/src/native/eventpipe/ep-event-source.c#L132-L154, where the metadata encodes version 1 but the actual event being added has version 0
@mdh1418 that represent the process info event emitted once per session. We should probably correct that inconsistency. Looking at perfview implementation it doesn't look like it does any structured parsing of this event https://github.com/brianrob/perfview/blob/a1c8b66ac9358dc4a7bbae41825e917c130f7ca3/src/TraceEvent/TraceLog.cs#L2091. Since we already introduced version 1 in the metadata we should move up the event definition to version 1 as well.
#### User_events Session Payload: | ||
* `uint output_format`: 1 | ||
* `ulong rundownKeyword`: Indicates the keyword for the rundown provider | ||
* `array<user_events_provider_config> providers`: The providers to turn on for the session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we drop the requestStackWalk bool for user events and I guess its intentional. Reading up on other discussions around stack traces I'm under the impression that we will rely on OS stack walking capabilities for user events, correct?
If that is the case then I assume we won't trigger the codepath in ep_buffer_manager_write_event where we check if session have the stack walks enabled (based on the requestStackWalk) and if the event requests stack walks (part of the event configuration) or make sure ep_session_get_enable_stackwalk (session) is always false for user event sessions.
There is one exception to this, the sample profiler provider, that provider does stack walks as part of sampling and pass stack directly to ep_write_sample_profile_event. Will we support the sample profiler for user events, or will we use some other mechanism provided by OS to sample threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a user_event eventpipe session, I think the aim is to not use any ep_buffer_manager* logic. Right now in the runtime PR draft, the stackwalk_requested is forcefully set to false if its a user_events session.
Will we support the sample profiler for user events, or will we use some other mechanism provided by OS to sample threads?
That was one of the things we were discussing offline, where I need to thoroughly investigate what is needed to make stackwalking possible for user_events, but I haven't gotten to that yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, one thing to keep an eye on is what requirements the OS unwinder puts on frames. AFAIK CoreCLR JIT doesn't provide native platform specific unwind information (DWARF) that gets registered with OS unwinder, this is only done when using NativeAOT (as part of the binary). CoreCLR JIT uses Windows unwind op codes on all platforms and rely on internal unwinder to correctly unwind stacks including managed frames. If this observation still holds, then OS unwinder will only be able to stackwalk managed frames if they are compiled with frame pointers, something the JIT optimizer can omit if not instructed to disable that optimization.
Parallel renaming to Runtime counterpart PR. The serialization format and output format were deemed confusing to have side by side, so renamed the `output format` to more clearly represent its usage
@lateralusX @noahfalk @beaubelgrave @brianrob @AaronRobinsonMSFT The runtime counterpart is nearly mergeable, and I haven't discovered any additional critical modifications to the spec for the initial iteration. I'm planning to merge this and the runtime PR soon (end of this week or early next week) and then more incremental follow-up PRs will be made to address additional |
In .NET 10, the EventPipe infrastructure will be leveraged to support user_events.
This PR documents the protocol for enabling a user_events-based EventPipe Session through the Diagnostics IPC protocol, where a new EventPipe Command ID
CollectTracing5
will accept necessary tracepoint configuration.As the user_events EventPipe session is not streaming based, the payload is expected to first encode a
uint output_format
to denote the session format (streaming vs user_events). Afterwards, only relevant session configuration options are to be encoded, outlined at the top of theEventPipe Commands
section in theStreaming Session
section,User_events Session
section, andSession Providers
section.For user_events EventPipe sessions, an additional tracepoint_config is to be encoded, to map Event IDs to tracepoints.
This protocol expects the Client to have access to the
user_events_data
file in order to enable configuring auser_event
EventPipe session, and expects the SCM_RIGHTS to theuser_events_data
file descriptor to be sent in the continuation stream.Additionally, as
user_events
does not support callbacks, a new event_filter config is expected to be encoded inCollectTracing5
, to act as an allow/deny list of Event IDs that will apply post keyword/level filter.