-
Notifications
You must be signed in to change notification settings - Fork 5k
Expose setting CPU SamplingProfiler rate #83635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -56,6 +56,17 @@ static | |||
void | |||
sample_profiler_enable (void); | |||
|
|||
static | |||
void | |||
ep_sample_event_pipe_callback( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate prototype isn't needed.
while (offset < filter_data_size) { | ||
candidateKey = filter_data_char + offset; | ||
if (strcmp(candidateKey, sampleProfilerIntervalMSKey) == 0) { | ||
ep_sample_profiler_set_sampling_rate(strtoull(candidateKey + 25, NULL, 10) * 1000000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check that strtoull returns a value greater than 0. If someone passes in a bad value that cannot be parsed it would return 0 and the sample thread would sample continuously and busy lock the app.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably even add logic to make sure the value falls within some accepted limits, so maybe a value between 1000 samples/second (current default and not sure we want to go lower than that with current sampling solution) and some lower limit that makes sense. You might want to sample quite infrequently in some cases, so not sure what the lower limit could be, so maybe we should just cap it at some sane value, like 1 sample/second, since its still a sample profiler that you configure. If you would like to get even more infrequent samples, then maybe it should be handled by a custom provider and not the sample profiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that keeping a sane range is a good idea. For example, ETW will not allow you to sample any more than every 0.125 seconds if I recall correctly. I'm not super concerned about the other side of the range because there isn't a risk of affecting the availability of the app - there is a risk of not getting data. I think it's probably safe to allow any large value, though at a certain point it doesn't really provide much value for the scenarios that we are currently tracking.
Thanks for opening this, @pedrobsaila! Sorry it went unnoticed for so long |
@pedrobsaila thanks for the contribution! Couple of thoughts:
|
Agree with @lateralusX's comments. One thing on how to express the value - there is prior art here in both directions. ETW uses a sampling rate (e.g. every 1ms), and perf uses Hz. From my perspective, either is fine, but I think the current APIs support the former. |
void* callback_data) | ||
{ | ||
if (filter_data) { | ||
ep_char8_t *filter_data_char = (ep_char8_t *)((uintptr_t)(filter_data->ptr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I read that this is implemented as filter data that is passed to the provider? If so, I have a bit of a concern around how to de-duplicate requests when there are multiple sessions. Right now, this is a single value. If we are going to allow this to be specified on provider enable, we should probably have some behavior that de-conflicts requests. I'm not sure if that means taking the most frequent rate of all session requests, or something else. Another option is to consider having an API outside of enablement that sets this, at which point it can remain a global, and that is expressed at the API level.
This was a significant reason why I didn't make the current implementation configurable. We tried to do configurable polling intervals for EventCounters but I don't feel like it has worked out that well. Things are OK as long as only one tool is doing work but as soon as a 2nd one shows up with different expectations about the interval things get very messy. Sorry I missed the discussion on #82939 when it happened earlier or I would have interjected at that point. How would folks feel about a different approach to solving this problem that won't cause multiple sessions to compete over setting a global sample frequency? Today we have Informational level events that are emitted at 1000Hz, what if we added a couple new events that are at a lower severity level but filtered with keywords for emitting samples at 1Hz, 10Hz, and 100Hz? For example a tool that wants to subscribe to 10Hz CPU samples would set level=critical,keywords=10HzCpuSamples. Internally we can dial the sampling rate up or down based on the highest requested frequency but each session would always get exactly the frequency of events it asked for. |
How often do we expect to have multiple sessions all requesting samples? I was thinking CPU sampling is fairly high overhead and you wouldn't normally run two profilers at once. When I think about potential objections that profilers would have over having the interval set to something other than what they specified, it seems like the perf impact on the process is the main concern and how many events they get is a lesser one. Then even if we do fancy things to make sure everybody gets samples at the right rate it will still have a big impact on the process if we are sampling at a high frequency. I'm willing to go with the community vote if other people disagree, but it doesn't seem that bad to have it globally configurable and if we had to do something more intricate I would vote for highest frequency winning. Our policy could be that setting the sample rate means we will sample at least at that interval, though it could be more often. |
@noahfalk, I think your proposal of multiple keywords is interesting. At the same time, I think it's fine to just keep a global sampling rate. I think the key is to change how it's set, so that it's clear that it's not a per-session thing - it's a global. |
So if I can do a resume for the above discussions :
|
CPU sampling at 1000Hz is high overhead, but at 1 or 10 its probably neglible. I agree that today it would be uncommon to have multiple because of the overhead, but once we give users a lower overhead option I don't see why they wouldn't change their behavior in response. I'd imagine that a major use-case is to create long-lived low overhead profiling tools that monitor in the background. I expect those tools will mostly be in addition to other tools rather than replacing them. So imagine that any scenario today where there is 1 profiler attached, in the future that might be two profilers with different desired sampling rates.
I think number of events is going to be important to them for a few reasons:
Assuming we do extra work to notify profilers what sample rate they are getting then yes, it is possible for them to contend with this by doing further down-sampling, but that is extra complexity in every tool. The need to re-sample won't be apparent from testing in isolation so I think there is a pretty good chance profiling tool authors may miss the requirement entirely.
I'm not sure if you are proposing its a global that is updated based on session specific requests as the current PR has it, or it is a global that is configured via some other mechanism independent of EventPipe sessions? Either way its not yet clear to me how a profiling tool author can make a robust tool when the sampling rate may differ from the rate they requested/expected given the current design. I do think we could keep adjusting the design until we made it work but I worry that once all those adjustments happen the final solution will be more complex, more work, and more error prone than offering a few alternate fixed rate events. These are some of the questions that come to mind:
I don't think we've reached a consensus on the direction yet which would be needed to answer these finer grained implementation questions, but I do expect the discussion will get us there. Thanks all! |
A couple of thoughts:
|
Thanks @lateralusX!
I'd be happy with that type of plan.
Yeah, I was attempting to simplify this issue of sampling intervals that aren't multiples of another by restricting the set of allowed values. I was guessing that multiples of 10 provided enough flexibility for tools without adding an unnecessarily large set of options. I'm not inherently opposed to other finer grained options if folks believe they are needed as long as the design addresses how we implement it, how we keep track of the rate in the nettrace file, and how we prevent multiple sessions from adversely affecting each other. Multiples have the nice property that the number of samples is always the max of the session sampling rates but even if we had a lesser bound where number_of_samples = max(1000, sum_of_sample_rate_in_all_sessions) that still feels OK. It should mean at worst starting a new sampling session with rate X might increase overhead by X additional samples/sec.
I'm not sure if you are proposing these extra wake-ups that take no samples to make the implementation simpler or that would create more normalized load over time? It wouldn't be my preference but as long as the wake-up frequency doesn't exceed 1000Hz I could take comfort that its not a regression from the status quo. My ideal solution for wakeups would be that we do as few as necessary to still deliver the requested samples. In the general case I assume that means calculating the next wake up time for each session and then sleeping for the min() of those times.
It makes me nervous if nothing is constraining the usage. The most likely usage I see for low-overhead sampling is that some monitoring tool wants to do continuous profiling in production over long periods time (a completely reasonable goal). The moment such an env var exists nothing stops those monitoring tools from setting the env vars at startup and treating it as a permanent solution. Eventually devs might try to use other tracing tools just as they do today but now those tools either don't work at all or they give very misleading results. In a simple world there would be no confusion because users would always set the env var with full knowledge and acceptance of the consequences and no-one is ever surprised later. However scenarios I anticipate in practice for non-trivial projects are different engineers and tools being involved, each one has limited knowledge about the actions of others, and one person/tool owner can easily make a choice without understanding or properly accounting for the impact it will have. Once the impact is eventually discovered it takes effort to root cause the issue and once understood the decisions can still be difficult to back-out because now people have started depending on that ongoing monitoring. So from my perspective the quick solution comes with substantial risks and I am advocating its not a good tradeoff. @pedrobsaila - Sorry that sorting the design is taking longer than I had hoped and the discussion is backtracking from what presumably looked like a more settled plan earlier. Do you have preferences how you hope this design will go or a specific goal you are aiming for with this work? |
Yes, the extra wake ups was just to check if there is a need to sample, but if no session hit their frequency nothing will be done the sample profiler thread can go back to sleep. As part of that it could look through all sessions frequency and decide the next minimum time to sleep before it needs to wake up to make sure all running sessions sample frequency is correctly handled. Adding/Removing sampling sessions probably need to wake up the profiling session thread, since it will need to recalculate its sleep time, meaning that it probably needs to wait on a event instead of just doing a sleep as it currently does. |
Looks like we currently have a field in the nettrace file header that has the sampleing rate expressed in nano seconds. In ep_file_alloc we do:
so we should be able to set that field to the sessions real sampling frequency and tools should be able to decide at what sampling frequency current nettrace file used. |
No worries. It's better to take time to refine well the solution than to rush to something bad for the users.
I don't have a strong opinion about the design because that's the first time I work on event pipe, I lack experience on this area. For the time being, I would rather follow the interesting discussion ^^. If we settle for a solution at the end, I would be happy to contribute to it. |
I was chatted offline with @brianrob and @davmason, together with @lateralusX's suggestions above I think we are all converging towards a similar design. Let me describe it here and then folks can either confirm this sounds good or raise issues/questions/suggestions:
I looked at the code a bit to offer some suggestions on how to implement this design, but @pedrobsaila if anything doesn't feel like it makes sense or you think there is a better way to do it happy to chat. Its entirely possible I overlooked things or made mistakes. To track session sampling rates:
The wait loop:
To write sampling events to selected sessions rather than all sessions:
Does this sound good? |
I think we can just extend
to also look at the passed in bitmask to make sure the session should be written into for this specific call.
The plan sounds reasonable to me. One thing to keep in mind is that we don't want to take locks in sample profiler loop to sync changes, so there will probably be an inherited race between enable/disable profiling sessions and sample profiler calculations. We still might need to issue memory barriers to make sure we don't see changed memory out of order. There is always the underlying filter on the event that will take place inside EventPipe, so that will take care of the case when a new session gets created on an old profiling session slot, just after sample profiler has calculated its bitmask. There is also the case where a new profiling session with a different sample rate ends up on the same slot as an old profiling session, just after sample profiler has calculated its bitmask, and probably a couple of other scenarios. As long as we can tolerate potential smaller frequency variations on the first sample events send to a profiling session, it will work. The current sample profiler implementation is "best effort" anyways, since the frequency set the sleep time of sampling thread, but doesn't account for time it takes to sample. |
Yeah I'm not expecting that level of error is likely to cause any practical problems. In the unlikely case that it was a real issue we could adjust the implementation to handle those corner cases more rigorously. |
Hello @pedrobsaila! Thanks for the community contribution and I hope that the feedback has been helpful on steering it in the right direction. It's been a while since there has been activity on this PR. Would you like to change the status to "Draft" while work is being done on it, or do you feel we are close to addressing the feedback? |
sorry I've been quite busy this last month. Just resumed working on it. I'll make it a draft until it's ready for review. |
Great to hear! |
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
Fixes #82939