-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Fix race condition with py::make_key_iterator in free threading #5971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+27
−11
Merged
Changes from 2 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
bfe55ed
Fix race condition with py::make_key_iterator in free threading
colesbury fa03953
style: pre-commit fixes
pre-commit-ci[bot] 70c2ffc
Use PyCriticalSection_BeginMutex instead of recursive mutex
colesbury 8690dd7
style: pre-commit fixes
pre-commit-ci[bot] be01d1e
Make pycritical_section non-copyable and non-movable
rwgk 61e032e
Drop Python 3.13t support from CI
rwgk 29840ce
Add Python 3.13 (default) replacement jobs for removed 3.13t jobs
rwgk 91189c9
ci: run in free-threading mode a bit more on 3.14
henryiii 603b25a
Merge branch 'master' into iterator-nogil-threadsafety
rwgk dc8e35c
Merge branch 'master' into colesbury→iterator-nogil-threadsafety
rwgk f3197de
Revert "ci: run in free-threading mode a bit more on 3.14"
rwgk 07e0360
Reapply "ci: run in free-threading mode a bit more on 3.14"
rwgk 5505b4d
[skip ci] Merge branch 'master' into colesbury→iterator-nogil-threads…
rwgk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@henryiii @rwgk - I could use some advice here. I probably should have used some sort of recursive mutex here from the start -- it's pretty difficult to do the locking for
make_iterator_implwithout it.I think changing this would require bumping
PYBIND11_INTERNALS_VERSION, at least forPy_GIL_DISABLEDbuilds. Is that acceptable?Alternatively, maybe we should use something like
Py_BEGIN_CRITICAL_SECTION_MUTEX, which supports recursion and eliminates some potential lock ordering deadlocks if you call into Python. The downside is that it is 3.14+ only.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the v3.0.2 release, yes. We already have three other PRs that need an internals version bump.
I was planning to release today (PR #5969), but there was a show stopper. We're waiting for a fix for the segfault.
My thinking: the race isn't new, therefore it would seem reasonable to defer fixing it until after the v3.0.2 release, when we have a window of opportunity to bump the internals version.
Caveat: I don't have enough background to decide between the internals version bump and the critical section alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Py_BEGIN_CRITICAL_SECTION_MUTEXsounds fine, I'd be okay to deprecate and remove Python 3.13t support. Maybe we could just fix this bug on 3.14t, and then drop 3.13t in 3.1?We are thinking about dropping 3.13t in cibuildwheel too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also fine to make the lock recursive in 3.1, we have several ABI bumps coming up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I think dropping 3.13t makes sense. I switched to
PyCriticalSection_BeginMutex.I saw there's also a py::scoped_critical_section. Not sure if you'd want to combine them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have a guard with
#if defined(Py_GIL_DISABLED) && defined(PY_VERSION_HEX) && PY_VERSION_HEX >= ...here? It is causing a compilation error with Python 3.13t. In addition, the functionPyCriticalSection_BeginMutexwas added in Python 3.14.0rc1 and Python 3.15.0a1.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, looks like we were too efficient dropping 3.13t. We should keep it working at the level before this PR for the v3.0.2 release, then fully drop 3.13t support after the release. @XuehaiPan could you help starting a PR for that? Just the code changes, to be sure it works for you. I can help adding back one 3.13t job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #5981