Skip to content

Make AccumulatingStagingArea Thread Safe#2288

Merged
laurit merged 2 commits intosignalfx:mainfrom
tduncan:thread-safe-accumulating-staging-area
Apr 25, 2025
Merged

Make AccumulatingStagingArea Thread Safe#2288
laurit merged 2 commits intosignalfx:mainfrom
tduncan:thread-safe-accumulating-staging-area

Conversation

@tduncan
Copy link
Copy Markdown
Contributor

@tduncan tduncan commented Apr 23, 2025

Stores staged StackTrace instances in a thread-safe Queue while exporting a "view" of that Queue when empty is called.

Queue<StackTrace> stackTraces = this.stackTraces.remove(traceId);
if (stackTraces != null) {
exporter.get().export(stackTraces);
exporter.get().export(new ArrayList<>(stackTraces));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you made the StackTraceExporter accept a Collection than you could skip the copy.

(id, stackTraces) -> {
if (stackTraces == null) {
stackTraces = new ArrayList<>();
stackTraces = new ConcurrentLinkedQueue<>();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When empty is called before the last stage from the schedule then you will end up leaking a map entry.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@laurit yes, though not a thread-safety issue and not introduced by this PR.

Copy link
Copy Markdown

@lo-jason lo-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two issues with this class. Agree with Lauri that the leaking of map objects is an issue, as is needing to access 2 concurrent datastructures to add samples. Wouldn't we be better off using ScheduledExecutorStackTraceSampler as a stagingArea since that is already per Thread and would solve above two issues?

If there is a concern in future we move to multi-threaded stack traces then the solution should be implemented in a way that won't make calling stage so expensive.

@tduncan
Copy link
Copy Markdown
Contributor Author

tduncan commented Apr 24, 2025

I see two issues with this class. Agree with Lauri that the leaking of map objects is an issue, as is needing to access 2 concurrent datastructures to add samples. Wouldn't we be better off using ScheduledExecutorStackTraceSampler as a stagingArea since that is already per Thread and would solve above two issues?

If there is a concern in future we move to multi-threaded stack traces then the solution should be implemented in a way that won't make calling stage so expensive.

@lo-jason The primary purpose of this PR is to both address concerns raised in #2277 and #2262 without the scope of those PRs growing beyond the specific goal of each. In both of those PRs the thread-safety problems of AccumulatingStagingArea were brought up, which I believe is addressed.

Further #2262 is preventing the replacement of AccumulatingStagingArea with an implementation that groups all StackTraces regardless of associated trace ID and periodically exports them as one batch (see here). That implementation is ready to submit as a PR except for a single test that fails sporadically because the StagingArea isn't closed (hence being blocked by #2262).

I'm reluctant to expend too much effort perfecting this implementation when I intend to immediately replace it (pending the merging of #2262)

@laurit laurit enabled auto-merge (squash) April 25, 2025 15:04
@laurit laurit merged commit 48bb5ba into signalfx:main Apr 25, 2025
26 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Apr 25, 2025
@tduncan tduncan deleted the thread-safe-accumulating-staging-area branch April 25, 2025 15:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants