Issue #2689: Unsatisfactory behavior of snapshot selector#2697
Issue #2689: Unsatisfactory behavior of snapshot selector#2697PeterF778 wants to merge 4 commits intosignalfx:mainfrom
Conversation
Change the algorithm for snapshot profiling selection to be exclusively based on trace-id. Removing the concepts of snapshot Volume, SnapshotVolumePropagator, and ProbabilisticSnapshotSelector. Updating unit tests.
|
All contributors have signed the CLA ✍️ ✅ |
|
recheck |
breedx-splk
left a comment
There was a problem hiding this comment.
Wow thanks, this is sooooo much nicer! 🏆
...r/src/main/java/com/splunk/opentelemetry/profiler/snapshot/TraceIdBasedSnapshotSelector.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/com/splunk/opentelemetry/profiler/snapshot/TraceIdBasedSnapshotSelector.java
Show resolved
Hide resolved
...r/src/main/java/com/splunk/opentelemetry/profiler/snapshot/TraceIdBasedSnapshotSelector.java
Outdated
Show resolved
Hide resolved
…hot/TraceIdBasedSnapshotSelector.java Co-authored-by: jason plumb <75337021+breedx-splk@users.noreply.github.com>
…hot/TraceIdBasedSnapshotSelector.java Co-authored-by: jason plumb <75337021+breedx-splk@users.noreply.github.com>
| * capable agents can make the same snapshotting decision, if necessary. | ||
| */ | ||
| @Override | ||
| public <C> Context extract(Context context, C carrier, TextMapGetter<C> getter) { |
There was a problem hiding this comment.
My understanding is that previously the decision whether to profile or not was propagated from upstream service using the volume baggage entry. So the first called service decided in some way whether to profile or not an other services followed that decision.
After these changes every service will decide independently whether to profile or not. If they are using the same algorithm with the same probability then they'll reach the same decision. If they use a different probability then they may reach a different decision. Do we need to confirm with someone that removing the behavior that the first service decides whether to profile or not is ok?
Secondly if we remove it now and replace the selection algorithm then services running old and new code will not reach the same profiling decision. Is this ok?
There was a problem hiding this comment.
My understanding is that previously the decision whether to profile or not was propagated from upstream service using the volume baggage entry. So the first called service decided in some way whether to profile or not an other services followed that decision. After these changes every service will decide independently whether to profile or not. If they are using the same algorithm with the same probability then they'll reach the same decision. If they use a different probability then they may reach a different decision. Do we need to confirm with someone that removing the behavior that the first service decides whether to profile or not is ok? Secondly if we remove it now and replace the selection algorithm then services running old and new code will not reach the same profiling decision. Is this ok?
One important detail is that while the original intention of the design was to propagate the profiling decision from upstream, it actually did not work. See the results of my testing that are quoted in the ticket. So I think that we do not really have to worry about "breaking" that behavior, because it had been already broken.
Furthermore, I do not think that the intention of the old design was appropriate. I do not see a use case for profiling downstream services because upstream service said so. It looks to me that that design was a copycat from AppD agent which does send downstream a correlation token asking for taking a snapshot. But that was different. The reason for taking such snapshots in AppD were to preserve information about a particular transaction instance (request) - normally the AppD agent sends summaries only. In OTel, there's no need for that, as this functionality comes free with every request/trace.
There was a problem hiding this comment.
Thanks for the explaination.
…ector # Conflicts: # profiler/src/test/java/com/splunk/opentelemetry/profiler/snapshot/SnapshotProfilingConfigurationCustomizerProviderTest.java # profiler/src/test/java/com/splunk/opentelemetry/profiler/snapshot/SnapshotVolumePropagatorComponentProviderTest.java
|
This PR cannot be merged because one of the commits is not signed. |
Change the algorithm for snapshot profiling selection to be exclusively based on trace-id. Removing the concepts of snapshot Volume, SnapshotVolumePropagator, and ProbabilisticSnapshotSelector. Updating unit tests.