fix(sdk/metric): avoid deadlock when registering instruments in callb…#8054
Open
garyatsierra wants to merge 2 commits intoopen-telemetry:mainfrom
Open
Conversation
One or more co-authors of this pull request were not found. You must specify co-authors in commit message trailer via: Supported
Please update your commit message(s) by doing |
…acks produce() held the pipeline mutex while executing observable callbacks. If a callback created a new instrument (e.g. a counter), the registration path called addSync() which tried to acquire the same mutex, causing a deadlock. Fix by snapshotting the callbacks under the lock, releasing it before executing them, then re-acquiring for the aggregation phase. This is safe because callbacks operate on internally-synchronized aggregation primitives and don't need the pipeline lock for execution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
855c99b to
06835cf
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
pipeline.produce()that occurs when an observable callback registers a new instrument (e.g. creates acounter via
meter.Int64Counter())produce()held the pipeline mutex while executing callbacks; if a callback calledaddSync()(via instrument creation),it tried to re-acquire the same mutex → deadlock
TestPipelineNoDeadlockOnInstrumentCreationDuringCallbackwith a 5s timeout to catch deadlocksDetails
We ran into this and we were a bit surprised by the behavior of being able to deadlock from using sdk apis. The test shows the scenario doing which is on measure we incr a counter if it met a certain threshold and would deadlock only if it wasn't cached.
Looking at the doc we do see:
In our case we weren't acquiring and outside mutex but rather the pipeline itself was trying to reaquire the pipeline deadlock
The deadlock call chain was:
Executing callbacks without the lock is safe because:
aggregate.Measurefunctions which use atomic operations internallycallbacksandmultiCallbacksdata structures are snapshotted under the lock before releaseaddCallback/addMultiCallbackcalls are still properly synchronized