-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Prevent deadlock and run callbacks asynchronously #7755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7755 +/- ##
=====================================
Coverage 86.2% 86.2%
=====================================
Files 302 302
Lines 21991 22011 +20
=====================================
+ Hits 18968 18986 +18
- Misses 2642 2645 +3
+ Partials 381 380 -1
🚀 New features to boost your workflow:
|
|
It looks like this probably introduced a race. Take a look at test-race and test-concurrent-safe, and let me know if you need help. To make sure I understand the issue:
Is this something that depends on particular user behavior (e.g. writing a callback that tries to acquire a mutex)? Or is this something that can simply happen with a "normal" callback implementation? If it requires users to do something, can you provide a reproduction? If it is non-trivial, it might be best to put that, and the description of the problem into an issue. |
Precisely, it was a situation where a callback wants to acquire some mutex (external to this repo). |
Could you provide a small reproduction then? |
Indeed, and it was even anticipated by yourself, offending code is : // Access to r.pipe.int64Measures is already guarded b a lock in pipeline.produce.
// TODO (#5946): Refactor pipeline and observable measures.
measures := r.pipe.int64Measures[oImpl.observableID]which was introduced in #5900 (relevant discussion) So I guess I will modify |
@open-telemetry/go-maintainers I also posted a comment here: #3034 (comment) |
|
I tend to solve the deadlock problem and asynchronization separately, as they are two different issues. For the deadlock part, I think we might first temporarily assign the data of the critical lock to a new temporary variable, then immediately unlock it, and finally handle the relevant callback processing. |
b83e01d to
40c4cbe
Compare
Yes this is exactly the approach I went with:
Sure yes, it seemed from #3034 that there was a demand for asynchrony, but if there are diverging opininions let's just focus on the locking part and leave the asynchrony for an ulterior PR |
- do not own the mutex when calling callbacks - at the end, return the first callback error if any
…sures from observe functions
40c4cbe to
aee7b6e
Compare
Hi,
I encountered a deadlock in a situation where a callback, called by the metric pipeline's
producefunction, was trying to acquire a mutex owned by another goroutine, that was itself stuck waiting to acquire the pipeline's mutex.It would seem to me that the callbacks having no access to the pipeline's members, do not need to hold its mutex when ran.
However a counter argument is that not holding the mutex allows multiCallbacks to be unregistered, so my PR now allows the call of a callback after it was unregistered (if it was unregistered after the
producefunction started executing but before the callback is effectively called).Because I saw the open issue #3034 I decided to try to shoot two birds with one stone, and made this first attempt.
Feedback is welcomed, please tell me if a different approach is preferred.