Prevent deadlock and run callbacks asynchronously #7755

agagniere · 2026-01-06T14:30:50Z

Hi,
I encountered a deadlock in a situation where a callback, called by the metric pipeline's produce function, was trying to acquire a mutex owned by another goroutine, that was itself stuck waiting to acquire the pipeline's mutex.

It would seem to me that the callbacks having no access to the pipeline's members, do not need to hold its mutex when ran.
However a counter argument is that not holding the mutex allows multiCallbacks to be unregistered, so my PR now allows the call of a callback after it was unregistered (if it was unregistered after the produce function started executing but before the callback is effectively called).

Because I saw the open issue #3034 I decided to try to shoot two birds with one stone, and made this first attempt.

Feedback is welcomed, please tell me if a different approach is preferred.

linux-foundation-easycla · 2026-01-06T14:30:59Z

The committers listed above are authorized under a signed CLA.

✅ login: agagniere / name: Antoine Gagniere (aee7b6e, cf82155)

codecov · 2026-01-06T16:57:52Z

Codecov Report

❌ Patch coverage is 93.54839% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.2%. Comparing base (3dc4ccc) to head (b83e01d).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
sdk/metric/pipeline.go	93.5%	2 Missing ⚠️

Additional details and impacted files

@@          Coverage Diff          @@
##            main   #7755   +/-   ##
=====================================
  Coverage   86.2%   86.2%           
=====================================
  Files        302     302           
  Lines      21991   22011   +20     
=====================================
+ Hits       18968   18986   +18     
- Misses      2642    2645    +3     
+ Partials     381     380    -1

Files with missing lines	Coverage Δ
sdk/metric/pipeline.go	`90.7% <93.5%> (+0.6%)`	⬆️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dashpole · 2026-01-06T21:01:40Z

It looks like this probably introduced a race. Take a look at test-race and test-concurrent-safe, and let me know if you need help.

To make sure I understand the issue:

I encountered a deadlock in a situation where a callback, called by the metric pipeline's produce function, was trying to acquire a mutex owned by another goroutine, that was itself stuck waiting to acquire the pipeline's mutex.

Is this something that depends on particular user behavior (e.g. writing a callback that tries to acquire a mutex)? Or is this something that can simply happen with a "normal" callback implementation? If it requires users to do something, can you provide a reproduction? If it is non-trivial, it might be best to put that, and the description of the problem into an issue.

agagniere · 2026-01-07T09:21:07Z

Is this something that depends on particular user behavior (e.g. writing a callback that tries to acquire a mutex)?

Precisely, it was a situation where a callback wants to acquire some mutex (external to this repo).

dmathieu · 2026-01-07T09:28:22Z

Precisely, it was a situation where a callback wants to acquire some mutex (external to this repo).

Could you provide a small reproduction then?

agagniere · 2026-01-07T10:42:52Z

@dashpole

It looks like this probably introduced a race. Take a look at test-race and test-concurrent-safe, and let me know if you need help.

Indeed, and it was even anticipated by yourself, offending code is :

	// Access to r.pipe.int64Measures is already guarded b a lock in pipeline.produce.
	// TODO (#5946): Refactor pipeline and observable measures.
	measures := r.pipe.int64Measures[oImpl.observableID]

which was introduced in #5900 (relevant discussion)

So I guess I will modify ObserveFloat64 to acquire the pipeline's lock before accessing its members ? Or do you have another recommendation ?

pellared · 2026-01-07T11:33:46Z

run callbacks asynchronously

@open-telemetry/go-maintainers
Wouldn't this be a behavioral change that may be breaking for some users that assume that they run synchronously?
Shouldn't this be an opt-in (configurable) behavior?

I also posted a comment here: #3034 (comment)

flc1125 · 2026-01-07T14:11:28Z

I tend to solve the deadlock problem and asynchronization separately, as they are two different issues.

For the deadlock part, I think we might first temporarily assign the data of the critical lock to a new temporary variable, then immediately unlock it, and finally handle the relevant callback processing.

agagniere · 2026-01-07T14:38:23Z

@flc1125

For the deadlock part, I think we might first temporarily assign the data of the critical lock to a new temporary variable, then immediately unlock it, and finally handle the relevant callback processing.

Yes this is exactly the approach I went with:

acquiring the mutex
copy the list of callbacks
then releasing the mutex
run the callbacks (concurrently or not, doesn't matter to me)
re-acquire the mutex
fill the scope metrics

I tend to solve the deadlock problem and asynchronization separately, as they are two different issues.

Sure yes, it seemed from #3034 that there was a demand for asynchrony, but if there are diverging opininions let's just focus on the locking part and leave the asynchrony for an ulterior PR

- do not own the mutex when calling callbacks - at the end, return the first callback error if any

…sures from observe functions

agagniere requested review from MrAlias, XSAM, dashpole, dmathieu, flc1125 and pellared as code owners January 6, 2026 14:30

agagniere force-pushed the prevent-deadlock branch from b83e01d to 40c4cbe Compare January 7, 2026 14:25

agagniere added 2 commits January 7, 2026 15:44

sdk metric pipeline: call the callbacks in produce asynchronously

cf82155

- do not own the mutex when calling callbacks - at the end, return the first callback error if any

sdk metric meter: acquire the pipeline mutex before accessing its mea…

aee7b6e

…sures from observe functions

agagniere force-pushed the prevent-deadlock branch from 40c4cbe to aee7b6e Compare January 7, 2026 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prevent deadlock and run callbacks asynchronously #7755

Prevent deadlock and run callbacks asynchronously #7755

agagniere commented Jan 6, 2026 •

edited

Loading

Uh oh!

linux-foundation-easycla bot commented Jan 6, 2026 •

edited

Loading

Uh oh!

codecov bot commented Jan 6, 2026

Uh oh!

dashpole commented Jan 6, 2026

Uh oh!

agagniere commented Jan 7, 2026

Uh oh!

dmathieu commented Jan 7, 2026

Uh oh!

agagniere commented Jan 7, 2026 •

edited

Loading

Uh oh!

pellared commented Jan 7, 2026 •

edited

Loading

Uh oh!

flc1125 commented Jan 7, 2026

Uh oh!

agagniere commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Prevent deadlock and run callbacks asynchronously #7755

Are you sure you want to change the base?

Prevent deadlock and run callbacks asynchronously #7755

Conversation

agagniere commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linux-foundation-easycla bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 6, 2026

Codecov Report

Uh oh!

dashpole commented Jan 6, 2026

Uh oh!

agagniere commented Jan 7, 2026

Uh oh!

dmathieu commented Jan 7, 2026

Uh oh!

agagniere commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pellared commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flc1125 commented Jan 7, 2026

Uh oh!

agagniere commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

agagniere commented Jan 6, 2026 •

edited

Loading

linux-foundation-easycla bot commented Jan 6, 2026 •

edited

Loading

agagniere commented Jan 7, 2026 •

edited

Loading

pellared commented Jan 7, 2026 •

edited

Loading