Skip to content

Conversation

@ChayimFriedman2
Copy link
Contributor

This PR can create deadlocks - there is nothing synchronizing the mutex/condvar with the Arc of the Zalsa, therefore code may wake up by the Condvar but not see an updated refcount, go to sleep again, and never wake up.

This PR can create deadlocks - there is nothing synchronizing the mutex/condvar with the `Arc` of the `Zalsa`, therefore code may wake up by the `Condvar` but not see an updated refcount, go to sleep again, and never wake up.
@netlify
Copy link

netlify bot commented Dec 16, 2025

Deploy Preview for salsa-rs canceled.

Name Link
🔨 Latest commit 0f3b30a
🔍 Latest deploy log https://app.netlify.com/projects/salsa-rs/deploys/6941aeef9be06a00084a41df

@ChayimFriedman2 ChayimFriedman2 marked this pull request as ready for review December 16, 2025 19:12
@codspeed-hq
Copy link

codspeed-hq bot commented Dec 16, 2025

CodSpeed Performance Report

Merging #1039 will improve performances by 4.12%

Comparing ChayimFriedman2:deadlock (0f3b30a) with master (0a3eec6)

Summary

⚡ 1 improvement
✅ 12 untouched

Benchmarks breakdown

Benchmark BASE HEAD Change
amortized[Input] 2.2 µs 2.1 µs +4.12%

@MichaReiser MichaReiser requested a review from Veykril December 16, 2025 19:14
@Veykril
Copy link
Member

Veykril commented Dec 17, 2025

Are you saying we might notify in a drop, see the current strong count, then decrement due to drop finishing, and the loop is already in a waiting state again?

@ChayimFriedman2
Copy link
Contributor Author

@Veykril Not exactly, rather this is a cache coherency thing. There is no happens-before relationship, so the CPU does not guarantee that the thread waking up due to the notify will see the refcount decrement.

@Veykril
Copy link
Member

Veykril commented Dec 17, 2025

Right, strong_count does a relaxed read only (there is an unstable is_unique which has the desired ordering). I do think my point is also valid though (the ordering of ops is just plain wrong as well).

Either way you are correct that change was invalid. (odd this hasn't blown up more)

@Veykril Veykril added this pull request to the merge queue Dec 17, 2025
@ChayimFriedman2
Copy link
Contributor Author

Even with an Acquire load for it (we can use get_mut(), in fact this was the first thing I tried) it does not help, because the Acquire/Release are on different atomic variables so they don't synchronize.

Merged via the queue into salsa-rs:master with commit 0731080 Dec 17, 2025
12 checks passed
@github-actions github-actions bot mentioned this pull request Dec 17, 2025
@ChayimFriedman2 ChayimFriedman2 deleted the deadlock branch December 17, 2025 08:52
@ibraheemdev
Copy link
Member

There is actually a happens-before relationship here because of drop order (though we do need Acquire on the load of the strong count), the problem is that the lock isn't held when decrementing the count, which can lead to missed wakeups. The code doesn't seem very performance sensitive though, so the obvious implementation seems fine.

@ChayimFriedman2
Copy link
Contributor Author

@ibraheemdev There isn't, or rather, not strong enough: you're right that (assuming we'd use Arc::get_mut() that has Acquire) the refcount decrement will synchronize with the get_mut(), but only if they happen each before other. A possible sequence of events is:

  1. Arc::get_mut() called on thread A.
  2. Arc refcount decrement to 1 on thread B.
  3. cvar.notify_all() on thread B, wakes no-one.
  4. cvar.wait() on thread A, waits forever.

There is a reason Condvar is accompanied with a Mutex - it's the only way to synchronize between the notify and the data.

@ibraheemdev
Copy link
Member

ibraheemdev commented Dec 18, 2025

@ChayimFriedman2 yes, that is what I meant by missed wakeups, and needing to hold the lock while decrementing the count.

therefore code may wake up by the Condvar but not see an updated refcount, go to sleep again, and never wake up.

I only meant to say that the comment here is not exactly true, because if there is a true wakeup, you are guaranteed to see the new count (even without Acquire on the strong count load — but you need that to access the data in case the synchronization did not happen).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants