Skip to content

RSDK-10618: Fix race contition in managedProcess #432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 12, 2025

Conversation

jmatth
Copy link
Member

@jmatth jmatth commented May 7, 2025

This PR fixes an issue where a race could occur between calls to managedProcess.Stop and parts of managedProcess.manage if the process crashed just before the call to Stop. This will have a matching change on the rdk side (viamrobotics/rdk#4975) where the race caused duplicate module processes to be created during reconfiguration.

One thing worth noting: this change has the potential to create a deadlock if calling code tries to hold the same lock goroutines calling Stop and in their restart callback. This was an issue I fixed in my rdk modmanager changes but there may be other places this code is used that need to be audited for potential deadlocks.

@jmatth jmatth requested a review from dgottlieb May 7, 2025 16:37
@CLAassistant
Copy link

CLAassistant commented May 7, 2025

CLA assistant check
All committers have signed the CLA.

@viambot viambot removed the safe to test Mark as safe to test label May 9, 2025
@jmatth
Copy link
Member Author

jmatth commented May 9, 2025

@dgottlieb I'm still testing things on the rdk side but I think this PR can be looked at now. The most important update from when we last went over these changes is manage now drops the lock before calling the unexpected exit handler like you recommended. That does mean that it's the responsibility of whatever code is using pexec to prevent races between the callback and any other code that calls Stop etc, which is what I'm working on over in the rdk modmanager.

@jmatth jmatth added the safe to test Mark as safe to test label May 9, 2025
@viambot viambot removed the safe to test Mark as safe to test label May 9, 2025
@dgottlieb
Copy link
Member

This all looks great to me. Very nice job

Copy link

For security reasons, this PR must be labeled with safe to test in order for tests to run.

@jmatth jmatth merged commit 0198bbf into viamrobotics:main May 12, 2025
4 checks passed
@jmatth jmatth deleted the RSDK-10618 branch May 12, 2025 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants