Collect Callstacks For Threads Processing Traces#2207
Collect Callstacks For Threads Processing Traces#2207breedx-splk merged 20 commits intosignalfx:mainfrom
Conversation
…stomizer to insulate individual tests from additional fields being added to the customizer.
There was a problem hiding this comment.
Temporary solution. One of the very next PRs will include the stack trace data export and this will be replaced.
|
|
||
| @Override | ||
| public void startSampling(String traceId, long threadId) { | ||
| ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor(); |
There was a problem hiding this comment.
The current implementation uses one profiling thread per trace selected for snapshotting. Should possibly consider a single profiling thread. Could be a followup change.
| } | ||
| } | ||
|
|
||
| static class TogglableTraceRegistry extends TraceRegistry { |
There was a problem hiding this comment.
Extracted to a standalone class so it can be used in other tests.
There was a problem hiding this comment.
This class is using a combination of the Object Mother and the Builder patterns to make it easier for tests to create instances of the profiling SDK customizer, plugging in different implementations of of the customizer's dependencies as needed. It also insulates tests from needed to respond directly to every dependency change made in the customizer.
The basic use will look like Profiling.customizer().build() and there are examples in the PR where tests plug in different TraceRegistry and StackTraceSampler implementations.
| public void onEnd(ReadableSpan span) { | ||
| if (isEntry(span)) { | ||
| registry.unregister(span.getSpanContext()); | ||
| sampler.stopSampling(Thread.currentThread().getId()); |
There was a problem hiding this comment.
Unfortunately the assumption that server span is ended by the same thread that started it isn't true. For example async servlet requests could end after the thread that started the span and called the servlet's service method has exited from the servlet code.
There was a problem hiding this comment.
Yes, lack of support for traces spanning multiple threads is a known limitation. We've intentionally not attempted to solve that problem just yet.
There was a problem hiding this comment.
You should add a comment stating this limitation and if possible elaborate how this is mitigated and why it does not have an adverse effect. Currently it looks like a resource leak where element is never moved from a map and sampling is never stopped for the affected thread.
There was a problem hiding this comment.
I was unaware that instrumentation for async frameworks started and ended spans on different threads. This will indeed lead to a resource leak and the continual sampling of the original thread. I'll add a test for this scenario and update the implementation to consider trace ID rather than thread ID so the profiling can be stopped appropriately.
It still won't support traces that delegate work to background processes or have multiple concurrent requests within the same trace.
| @@ -61,6 +64,7 @@ public boolean isStartRequired() { | |||
| public void onEnd(ReadableSpan span) { | |||
| if (isEntry(span)) { | |||
| registry.unregister(span.getSpanContext()); | |||
There was a problem hiding this comment.
Idk if this is an issue but the same service could have multiple requests concurrent requests from the same trace. For example imagine a reset service that gets called by some other service multiple times.
There was a problem hiding this comment.
Yes, that scenario wouldn't be fully supported. The current implementation assumes everything is happening in the same thread.
There was a problem hiding this comment.
Could you add a comment describing that limitation.
| StackTrace stackTrace = StackTrace.from(now, threadInfo); | ||
| stagingArea.stage(threadId, stackTrace); | ||
| } catch (Exception e) { | ||
| LOGGER.severe(e::getMessage); |
There was a problem hiding this comment.
Considering this is a user visible error it might be best to change the log message to a sentence describing the nature of failure and log the full stack trace along with the exception message. If this ever happens there is a chance that someone will need to figure out what went wrong.
| import java.util.logging.Logger; | ||
|
|
||
| class ScheduledExecutorStackTraceSampler implements StackTraceSampler { | ||
| private static final Logger LOGGER = |
There was a problem hiding this comment.
the convention in this project is to use lowercase logger
…ting callstacks from profilied threads.
…ntually track traces, not only thread IDs.
…reads managing a spans lifecycle (e.g. async web frameworks) and avoid a resource leak and runaway profiling thread.
…op trace profiling.
| @Override | ||
| public void start(SpanContext spanContext) { | ||
| ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor(); | ||
| samplers.put(spanContext.getTraceId(), scheduler); |
There was a problem hiding this comment.
I think it would be a good idea to either not add a new scheduler when given trace is already in the map (multiple concurrent request for the same trace) or stop the scheduler that was already in the map or allow multiple schedulers for given trace. Otherwise you'll leak the ScheduledExecutorService that was already in the map.
| * <b>Implementation Note</b><br> | ||
| * Only single-threaded traces are currently supported. Behavior of the extension in a multithreaded | ||
| * trace environment is unspecified and explicitly not supported. |
There was a problem hiding this comment.
I think you should rephrase the comment in simpler terms. Perhaps something along the lines of Profiling multiple concurrent traces with the same trace id is not supported. To me the meaning of single-threaded and multithreaded is not obvious here.
There was a problem hiding this comment.
I'll think about how to make it more clear.
There are two primary use cases that are not supported: 1) a service is called multiple times concurrently within the same trace and 2) a service delegates some of its work to a background process within the same trace. Neither is explicitly supported by this initial implementation.
This PR adds the trace profiling ability to the snapshot profiling extension. Profiling is activated with the span processor when an entry span is encountered that has been selected for snapshotting, and profiling is stopped when that same entry span is ended.
At the moment this PR does not export the collected callstacks as logs. That ability will closely follow the merging of this PR.