-
Notifications
You must be signed in to change notification settings - Fork 92
avoid solver race conditions with a lock #569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ideally we would solve this after parsing the command line and before entering tests, but unfortunately this approach does not work because solvers can be overridden at the test level. So we could analyze the config when it's created at the different levels (command line, contract, function...) but this seems overly complex, and we may also run contracts and tests concurrently in the future so this may break. Instead let's use a simple lock on `ensure_solver_available`, captured in the Java-style `synchronized` decorator. This time the flow is that `ensure_solver_available` can only be entered by one thread at a time, therefore: - the first time we find a missing solver, we start the download - other concurrent queries are waiting - once the download is finished and the command is resolved, the lock is released - the concurrent queries get the resolved command, without having to trigger the download again fixes #568
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses race conditions in solver availability checking by introducing thread synchronization to prevent multiple concurrent downloads of the same solver binary. The fix uses a Java-style synchronized decorator to ensure only one thread can execute the solver availability check at a time.
- Adds a
synchronizeddecorator using threading locks to prevent race conditions - Applies synchronization to the
ensure_solver_availablefunction to prevent concurrent solver downloads
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/halmos/utils.py | Adds a synchronized decorator implementation using threading.RLock() |
| src/halmos/solvers.py | Applies the @synchronized() decorator to ensure_solver_available function |
| return None | ||
|
|
||
|
|
||
| @synchronized() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess the performance impact of introducing a lock here shouldn't be a concern, because find_solver_binary should be quick, so the lock wait time for other processes would be negligible, right? (i also ran performance tests with 9 solver threads, and didn't see any meaningful regression.)
that said, i'm wondering if we could move the synchronized decorator to install_solver() to minimize any potential performance overhead.
while this function is always executed, install_solver is typically not. so if we add a check at the beginning of install_solver (possibly by calling find_solver_binary again) to see whether the solver is already installed, we could safely apply the synchronized decorator there. this way, locking would be avoided in most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this but it's pretty negligible:
python -m timeit --setup 'import threading; lock = threading.RLock()' 'with lock: pass'
2000000 loops, best of 5: 199 nsec per loop
200ns to acquire and release the lock, so close to no overhead in most cases. I don't expect it to register when profiling, but we can adjust if it does
ideally we would solve this after parsing the command line and before entering tests, but unfortunately this approach does not work because solvers can be overridden at the test level. So we could analyze the config when it's created at the different levels (command line, contract, function...) but this seems overly complex, and we may also run contracts and tests concurrently in the future so this may break.
Instead let's use a simple lock on
ensure_solver_available, captured in the Java-stylesynchronizeddecorator. This time the flow is thatensure_solver_availablecan only be entered by one thread at a time, therefore:fixes #568