-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continuous profiler #3196
Continuous profiler #3196
Conversation
lack of call for memory samples in rare cases
to avoid any exceptions
e291bd5
to
dfcadbc
Compare
About the Continuous Profiling - Thread samplingThread sampling can be enabled by the custom plugin. How does the thread sampler work?The profiler leverages .NET profiling to perform periodic call stack sampling. For every sampling period, the runtime is suspended and the samples for all managed thread are saved into the buffer, then the runtime resumes. The separate managed-thread is processing data from the buffer and export it the way defined by the plugin. To make the process more efficient, the sampler uses two independent buffers to store samples alternatively. Requirements
Enable the profilerImplement custom plugin. Configuration settings by plugin
Escape hatchThe profiler limits its own behavior when both buffers used to store sampled data are full. This scenario might happen when the data processing thread is not able Thread sampler will resume when any of the buffers are empty. Troubleshooting the .NET profilerHow do I know if it's working?At the startup, the SignalFx Instrumentation for .NET logs the string You can grep for this in the native logs for the instrumentation to see something like this:
How can I see Continuous Profiling configuration?The OpenTelemetry .NET AutomaticInstrumentation logs the profiling configuration What does the escape hatch do?The escape hatch automatically discards profiling data If the escape hatch activates, it logs the following message:
You can also look for:
If you see these log messages, check the exporter implementation. What if I'm on an unsupported .NET version?None of the .NET Framework versions is supported. You have to switch to supported .NET version. Can I tell the sampler to ignore some threads?There is no such functionality. All managed threads are captured by the profiler. |
About Continuous memory profiling for .NETThe profiler samples allocations, captures the call stack state for the .NET thread that triggered the allocation, and exporting it in appropriate format. Use the memory allocation data, together with the stack traces and .NET runtime metrics, to investigate memory leaks and unusual consumption patterns in Continuous Profiling. How does the memory profiler work?The profiler leverages .NET profiling The managed thread shared with CPU Profiler processes the data from the buffer and exports in the way defined by the plugin.. Requirements
Enable the profilerImplement custom plugin. Configuration settings by the plugin
Escape hatchThe profiler limits its own behavior when buffer Current maximum size of the buffer is 200 KiB. This scenario might happen when the data processing thread is not able Troubleshooting the .NET profilerHow do I know if it's working?At the startup, the OpenTelemetry .NET Automatic Instrumentation will log the string You can grep for this in the native logs for the instrumentation
How can I see Continuous Profiling configuration?The OpenTelemetry .NET AutomaticInstrumentation logs the profiling configuration What does the escape hatch do?The escape hatch automatically discards captured allocation data If the escape hatch activates, it logs the following message:
If you see these log messages, check the configuration and communication layer What if I'm on an unsupported .NET version?None of the .NET Framework versions is supported. You have to switch to supported .NET version. |
src/OpenTelemetry.AutoInstrumentation/ContinuousProfiler/ContinuousProfilerProcessor.cs
Show resolved
Hide resolved
src/OpenTelemetry.AutoInstrumentation/ContinuousProfiler/ContinuousProfilerProcessor.cs
Show resolved
Hide resolved
src/OpenTelemetry.AutoInstrumentation/Plugins/PluginManager.ContinuousProfiler.cs.cs
Outdated
Show resolved
Hide resolved
nit: I recommend placing a launchSettings.json file with environment variables in the |
Done in 2b957d1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
nit: may need to recheck native formatting rules, loads of missing lines, different styles etc. Too much to deal with in this branch.
src/OpenTelemetry.AutoInstrumentation/Plugins/PluginManager.ContinuousProfiler.cs
Show resolved
Hide resolved
src/OpenTelemetry.AutoInstrumentation/ContinuousProfiler/ContinuousProfilerProcessor.cs
Show resolved
Hide resolved
return; | ||
} | ||
|
||
pdvEventsLow |= COR_PRF_MONITOR_THREADS | COR_PRF_ENABLE_STACK_SNAPSHOT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these flags needed if only allocation sampling is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COR_PRF_MONITOR_THREADS
- needed to have the info about thread events. Needed to corelate stack traces with Traces/Spans. Needed for both allocation and thread samplingCOR_PRF_ENABLE_STACK_SNAPSHOT
- Enables calls to the DoStackSnapshot method.DoStackSnapshot
method is also executed for thread and allocation sampling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be worth adding a comment in code?
@@ -137,6 +167,12 @@ public static void Initialize() | |||
Logger.Information("Initialized lazily-loaded metric instrumentations without initializing sdk."); | |||
} | |||
} | |||
#if NET6_0_OR_GREATER | |||
if (profilerEnabled && (threadSamplingEnabled || allocationSamplingEnabled)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this is done last? Could this be moved to the earlier conditional blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically yes, but for traces and metrics we would like to take all possible data (at least for AlwaysOn Sampler).
For the profiler, we are heavily dropping data/sampling once per second(s). So dropping data is acceptable, both at the beggining and end of the process.
src/OpenTelemetry.AutoInstrumentation.Native/continuous_profiler.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM
Co-authored-by: John Bley <[email protected]>
I am merging this PR as we agreed yesterday. Please let me know if you have any other comments. |
Why
Towards #3074
What
Implementation for extendible continuous profiler for thread and allocations. It is donation of Splunk code from SignalFx repository with some dedicated adjustments.
Separate comments describe:
As there is no way to export these data in any OTel common way, I would like to keep this documentation only in PR.
Also Changelog is omitted. There is a plan to address it in the future, when open-telemetry/oteps#239 or open-telemetry/oteps#237 will be ready and merged.
Tests
CI + testing with Splunk exporter implemented by plugin: signalfx/splunk-otel-dotnet#393
Checklist
[ ]CHANGELOG.md
is updated.[Documentation is updated.Documentation is included only in this PRHow to review
As the PR is huge, my recommendation is to start with plugin implementation and the exporter itself (TestApplication folder). Then check which code is calling it. Last part - verification of the native code which do the most of the tricks.
If you need, I will be happy to make real time peer review. I have did it already with @pellared and we have found couple of things to improve.
Notes
While merging, please add following comment: