Skip to content

Conversation

freakboy3742
Copy link
Member

We saw an odd failure in a dependabot CI run; this test only failed on macOS-15 Python 3.14, and passed on a re-run. As far as I can make out, it's a very niche race condition that is almost impossible to reproduce programmatically.

PR Checklist:

  • All new features have been tested
  • All new features have been documented
  • I have read the CONTRIBUTING.md file
  • I will abide by the code of conduct

@freakboy3742 freakboy3742 requested a review from mhsmith October 20, 2025 00:20
@mhsmith
Copy link
Member

mhsmith commented Oct 20, 2025

From the CI log linked above:

_____________________ AsyncSubprocessTests.test_subprocess _____________________

self = <tests.test_async.AsyncSubprocessTests testMethod=test_subprocess>

    def tearDown(self):
        if sys.version_info < (3, 14):
            asyncio.set_event_loop_policy(None)
>       self.loop.close()

tests/test_async.py:167: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.tox/py/lib/python3.14/site-packages/rubicon/objc/eventloop.py:646: in close
    handler.cancel()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <CFTimerHandle cancelled when=782601318.778928>

    def cancel(self):
        """Cancel the Timer handle."""
        super().cancel()
        libcf.CFRunLoopRemoveTimer(
>           self._loop._cfrunloop, self._timer, kCFRunLoopCommonModes
                                   ^^^^^^^^^^^
        )
E       AttributeError: 'CFTimerHandle' object has no attribute '_timer'

@mhsmith
Copy link
Member

mhsmith commented Oct 20, 2025

What actually is the race condition? Is the loop is closing down in the top half of CFTimerHandle.__init__, before _timer is assigned, and if so, how could that happen? The only line that directly interacts with the loop is self._loop._timers.add(self), but I don't think it could be that, because _loop._timers is just a normal set.

@freakboy3742
Copy link
Member Author

What actually is the race condition? Is the loop is closing down in the top half of CFTimerHandle.__init__, before _timer is assigned, and if so, how could that happen? The only line that directly interacts with the loop is self._loop._timers.add(self), but I don't think it could be that, because _loop._timers is just a normal set.

The honest truth is that I'm not 100% certain myself.

My best guess is that the CFTimerHandle actually failed during construction, so the Python object existed enough to be an object, but not enough to be fully instantiated with a valid _timer. However, asyncio execution handling is a bit of a mess, that exception was eaten, and the cleanup process then raised an error because of the incomplete object.

The fix I've proposed here accounts for the messy cleanup process; I'm not sure if, had this change been in place, we might have gotten a more complete report of the underlying problem that caused the timer to fail.

@mhsmith
Copy link
Member

mhsmith commented Oct 21, 2025

Yes, that makes sense: if CFRunLoopTimerCreate fails, then a reference to the incompletely-constructed object would still be kept in self._loop._timers.

Could we prevent this by moving self._loop._timers.add to the bottom of __init__? Then cancel wouldn't need to change, and it might avoid other problems as well.

Or if that isn't possible for some reason, we could wrap the bottom half of __init__ with a try/except which removes it from self._loop._timers if construction fails.

@freakboy3742
Copy link
Member Author

Yes, that makes sense: if CFRunLoopTimerCreate fails, then a reference to the incompletely-constructed object would still be kept in self._loop._timers.

It's not the value in self._loop._timers that is the issue, though. discard() is a no-op if a set doesn't contain a member:

>>> s = {1,2,3}
>>> s.discard(4)
>>>

The issue is with the CFRunLoopRemoveTimer call, and the fact that self._timer (not self._loop._timers) doesn't exist.

Could we prevent this by moving self._loop._timers.add to the bottom of __init__? Then cancel wouldn't need to change, and it might avoid other problems as well.

I'm not sure I follow what you're suggesting here.

Or if that isn't possible for some reason, we could wrap the bottom half of __init__ with a try/except which removes it from self._loop._timers if construction fails.

Again, it's not self._loop._timers that is the problem; it's the existence of self._timers as an attribute. Catching an exception in the call to CFRunLoopTimeCreate would allow settings self._timers = None, but that only changes whether we need to use getattr() or a straight attribute lookup - the check for a None value would still exist.

@mhsmith
Copy link
Member

mhsmith commented Oct 22, 2025

As discussed in person: the stack trace shows that the crash is happening while iterating over self._loop._timers during loop shutdown. If the invalid timer was never added to the list (or was removed as soon as it became invalid), then the problem could have been avoided closer to the source.

Any exception from CFRunLoopTimeCreate (if that is indeed the cause) should still be allowed to propagate: this wouldn't fix whatever is stopping it from being reported, but that's probably a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants