-
Notifications
You must be signed in to change notification settings - Fork 117
RSDK-10618: Fix data race when modules crashes during reconfiguration #4975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
module/modmanager/manager.go
Outdated
// There is a circular dependency that causes a deadlock if a module dies | ||
// while being reconfigured. Break it here by giving up on the restart if we | ||
// cannot lock the mananger. | ||
if locked := mgr.mu.TryLock(); !locked { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this right, if a module crashes at the same time the mod manager has the lock for an unrelated reason (perhaps adding a resource to a different module), we'll give up on restarting the module? Is there some other means that would restart it?
And I suppose the problem (based on the comment) from just using the mod.inRecoveryLock
here is that module reconfiguration does not take the in recovery lock, but would need to (for the purpose of serializing with restarts -- which it definitely does from a logic perspective)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll give up on restarting the module
Correct
Is there some other means that would restart it?
Not that I'm aware of
And I suppose the problem (based on the comment) from just using the mod.inRecoveryLock here is that module reconfiguration does not take the in recovery lock...
Correct, that lock was only ever taken by the restart callback. I actually removed it (and that weird atomic bool) since taking the modmananger lock made it unnecessary.
This has a matching change in viamrobotics/goutils#432 that actually fixes the race condition. Unfortunately the way
modmanager
is written meant that fixing the race caused a deadlock, which this PR addresses.