Skip to content

gh-133465: Allow PyErr_CheckSignals to be called without holding the GIL. #133466

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

zackw
Copy link

@zackw zackw commented May 5, 2025

Addresses #133465. See there, or the commit message for the first commit in this PR, for rationale.

There are two commits: the first actually implements the change, and the second demonstrates the motivation for it by pulling a lot of uses of Py_(BEGIN|END)_ALLOW_THREADS within Python's stdlib out of loops.

This has been tested (lightly - just the built in testsuite) both with and without --disable-gil; however, I did not test --enable-optimizations nor --enable-experimental-jit.


📚 Documentation preview 📚: https://cpython-previews--133466.org.readthedocs.build/

@python-cla-bot
Copy link

python-cla-bot bot commented May 5, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@bedevere-app
Copy link

bedevere-app bot commented May 5, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Comment on lines +1 to +7
:c:func:`PyErr_CheckSignals` has been changed to acquire the global
interpreter lock (GIL) itself, only when necessary (i.e. when it has work to
do). This means that modules that perform lengthy computations with the GIL
released may now call :c:func:`PyErr_CheckSignals` during those computations
without re-acquiring the GIL first. (However, it must be *safe to* acquire
the GIL at each point where :c:func:`PyErr_CheckSignals` is called. Also,
keep in mind that it can run arbitrary Python code before returning to you.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A NEWS entry should be more concise, users can refer to docs for in depth explanations.

Copy link
Author

@zackw zackw May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this better?

:c:func:`PyErr_CheckSignals` has been made safe to call without holding the GIL.
It will acquire the GIL itself when it needs it.

@zackw
Copy link
Author

zackw commented May 5, 2025

I am unable to reproduce the failure of test_queue, seen in https://github.com/python/cpython/actions/runs/14844414631/job/41674826787?pr=133466, even after running the test repeatedly with high levels of --parallel-threads. I will update the branch with a fix for the doc issue shortly.

zackw added 2 commits May 5, 2025 17:29
Compiled-code modules that implement time-consuming operations that
don’t require manipulating Python objects, are supposed to call
PyErr_CheckSignals frequently throughout each such operation, so that
if the user interrupts the operation with control-C, it is cancelled
promptly.  In the normal case where no signals are pending,
PyErr_CheckSignals is cheap; however, callers must hold the GIL,
and compiled-code modules that implement time-consuming operations
are also supposed to release the GIL during each such operation.
The overhead of reclaiming the GIL in order to call PyErr_CheckSignals,
and then releasing it again, sufficiently often for reasonable user
responsiveness, can be substantial.

If my understanding of the thread-state rules is correct,
PyErr_CheckSignals only *needs* the GIL if it has work to do.
*Checking* whether there is a pending signal, or a pending request
to run the cycle collector, requires only a couple of atomic loads.
Therefore:  Reorganize the logic of PyErr_CheckSignals and its close
relatives (_PyErr_CheckSignals and _PyErr_CheckSignalsTstate) so that
all the “do we have anything to do” checks are done in a batch before
anything that needs the GIL.  If any of them are true, acquire the GIL,
repeat the check (because another thread could have stolen the event
while we were waiting for the GIL), and then actually do the work,
enabling callers to *not* hold the GIL.

(There are some fine details here that I’d really appreciate a second
pair of eyes on — see the comments in the new functions
_PyErr_CheckSignalsHoldingGIL and _PyErr_CheckSignalsNoGIL.)
The source tree contains dozens of loops of this form:

    int res;
    do {
        Py_BEGIN_ALLOW_THREADS
        res = some_system_call(arguments...);
        Py_END_ALLOW_THREADS
    } while (res < 0 && errno == EINTR && !PyErr_CheckSignals());

Now that it is possible to call PyErr_CheckSignals without holding the
GIL, the locking operations can be moved out of the loop:

    Py_BEGIN_ALLOW_THREADS
    do {
        res = some_system_call(arguments...);
    } while (res < 0 && errno == EINTR && !PyErr_CheckSignals());
    Py_END_ALLOW_THREADS

This demonstrates the motivation for making it possible to call
PyErr_CheckSignals without holding the GIL.  It shouldn’t make any
measurable difference performance-wise for _these_ loops, which almost
never actually cycle; but for loops that do cycle many times it’s very
much desirable to not take and release the GIL every time through.

In some cases I also moved uses of _Py_(BEGIN|END)_SUPPRESS_IPH, which
is often paired with Py_(BEGIN|END)_ALLOW_THREADS, to keep the pairing
intact.  It was already considered safe to call PyErr_CheckSignals
from both inside and outside an IPH suppression region.

More could be done in this vein: I didn’t change any loops where the
inside of the loop was more complicated than a single system call,
_except_ that I did refactor py_getentropy and py_getrandom (in
bootstrap_hash.c) to make it possible to move the unlock and lock
outside the loop, demonstrating a more complicated case.
@zackw zackw force-pushed the check-signals-without-gil branch from 40097b8 to 3e25a93 Compare May 5, 2025 21:29
Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully decided how I feel about this yet. I agree with the motivation, but PyGILState_Ensure is evil. We might be able to sidestep most of the issues here, though (for one, signal handling isn't done in subinterpreters, so we don't have to worry about interpreter-guessing issues).

My main concern is that we're changing something that's in the stable ABI. That's generally a big no-no, because those are supposed to have a "frozen" interface. We might want this in a new API (e.g., something like PyErr_CheckSignalsFast).

Comment on lines +1900 to +1901
/* FIXME: Given that we already have 'tstate', is there a more efficient
way to do this? */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you want PyThreadState_Swap(tstate).

Determine whether there is actually any work needing to be done.
If so, acquire the GIL if necessary, and do that work. */
static int
_PyErr_CheckSignalsNoGIL(PyThreadState *tstate, bool cycle_collect)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshedding: let's call this "NoTstate", because you'll still need a thread state on free-threaded builds.

/* If this thread does not have a thread state at all, then it has
never been associated with the Python runtime, so it should not
attempt to handle signals or run the cycle collector. */
if (!tstate) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should ignore this. This seems like blatant misuse--either return a failure or emit a fatal error.

@ZeroIntensity
Copy link
Member

TSan failure looks unrelated. I've restarted the job for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants