-
-
Notifications
You must be signed in to change notification settings - Fork 731
Fix deadlocks and crashes seen on free-threaded Python #1133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
1210fc1 to
794da9c
Compare
794da9c to
4bd5909
Compare
79902d8 to
8bd96b4
Compare
|
So it looks like there's still a possible deadlock on Windows that needs to be tracked down - this run is deadlocked on I'm also not sure if the flaky test failures I'm seeing are "real" given that current |
|
I'm also seeing failures due to the global fixture that tries to check for thread leaks: Lines 23 to 32 in 4bc8f79
I'm not sure if what the fixture is trying to do makes any sense in the free-threaded build. In particular just calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this fixing the issue where .run() may be called after its thread has already been signaled to stop, but that is not checked for. Am I missing it?
I was expecting to see that added to the BaseThread class.
src/watchdog/observers/api.py
Outdated
| # To allow unschedule/stop and safe removal of event handlers | ||
| # within event handlers itself, check if the handler is still | ||
| # registered after every dispatch. | ||
| for handler in self._handlers[watch].copy(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is .copy() thread-safe? Otherwise it should likely be guarded by a lock as while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rearranged so the lock is acquired in the for loop and only gets released during blocking calls, to avoid deadlocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I checked again and it looks like I can actually leave the locking as it was before, so never mind.
src/watchdog/observers/api.py
Outdated
| with self._lock: | ||
| if handler not in self._handlers[watch]: | ||
| continue | ||
| handler.dispatch(event) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may still be a race condition here: At this point the handler could have been removed since the check in the previous line.
| whandle = self._whandle | ||
| if whandle: | ||
| self._whandle = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand what race this is guarding against, but it seems like it would be better to have a lock around this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is guarding against the file handle re-use crash described here. Let me see if locking also works, since that's a lot more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried adding explicit locking, but that deadlocks in test_emitter.py::test_delete_self. The main thread tries to join the emitter thread, but that is blocked on the observer closing the emitter's file handle.
Adding an early return to |
Fix deadlock3
I take this back. It does help with the deadlocks but we still need the other early returns. I think this is ready for review again now. |
|
The Windows 3.14t job is deadlocking. I hadn't seen that deadlock locally but managed to reproduce it after running the deadlocking test 10 or so times. Here's the tracebacks from all the hung threads: https://gist.github.com/ngoldbaum/b32563967ab629ecb0a84c88efc94a16 Is it possible that the Windows kernel APIs we're calling via ctypes need some kind of global lock to avoid simultaneous calls like this? |
src/watchdog/utils/__init__.py
Outdated
| if self._stopped_event.is_set(): | ||
| # stop was called while we were doing setup, | ||
| # so don't actually spawn a thread | ||
| self.on_thread_stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stop calls this, so it's not necessary I think.
I think I might understand what's happening here. The docs for I'm not sure how far I should go touching the windows ctypes wrappers so I think I'll stop here to wait for further code review. |
Fix deadlock4
Closes #1132.
I got to this state after a lot of trial and error and I'd really appreciate it if @taleinat could perhaps look this over, since he also thought about this problem recently.
As I suspected in #1132 (comment), the issue is that
BaseObserver.startdidn't do any locking at all, so if someone engineers a situation wherestopgets called whilestartis still running, you get a deadlock.Adding locking in
startwasn't quite enough, you also need to check indispatch_eventsbefore dipatching an event if another thread calledstopor removes a watch whiledispatch_eventsis running.I also applied @colesbury's fix for the file handle re-use issue that leads to a Python crash, see #1132 (comment) for more on that.