Skip to content

Consider doing something about -fzero-call-used-regs and the tail-calling interpreter #130961

Open
@nelhage

Description

@nelhage

I'm not certain any action is appropriate here, but I'm cross-posting to the Python bugtracker at @Fidget-Spinner's request, and for visibility. Much of this text is copied from my issue on LLVM.

GCC and clang support the -fzero-call-used-regs option / zero_call_used_regs attribute. This option causes the compiler to zero out certain registers on exit from a function, to mitigate ROP and partially protect against certain architectural side-channels.

This option is not widely used at present, but some distributions enable it by default, such as NixOS.

The tail-call interpreter emits a separate function per opcode implementation. This means that -- when compiled using clang and -fzero-call-used-regs -- each opcode emits some additional xor instructions to zero-out certain registers, prior to the tail dispatch. The computed goto interpreter does not get these instructions, even though it emits very similar indirect jumps, which could potentially be used for similar JOP gadgets.

In my experiments, I see a ~2% hit from -fzero-call-used-regs with computed gotos, and ~3% with tail-calls. However, the tail-call hit can be substantially reduced (to ~1%, in my testing) by skipping zeroing only for the bytecode functions, using __attribute__((zero_call_used_regs("skip"))). (Note that I don't trust my benchmark setup to have stability at the ~1% scale, so take these exact numbers with a grain of salt.) I tested using this patch:

diff --git i/Python/ceval_macros.h w/Python/ceval_macros.h
index 1bef2b845d0..cddad845fea 100644
--- i/Python/ceval_macros.h
+++ w/Python/ceval_macros.h
@@ -77,9 +77,10 @@
     // Note: [[clang::musttail]] works for GCC 15, but not __attribute__((musttail)) at the moment.
 #   define Py_MUSTTAIL [[clang::musttail]]
 #   define Py_PRESERVE_NONE_CC __attribute__((preserve_none))
+#   define Py_SKIP_ZERO_USED_REGS __attribute__((zero_call_used_regs("skip")))
     Py_PRESERVE_NONE_CC typedef PyObject* (*py_tail_call_funcptr)(TAIL_CALL_PARAMS);

-#   define TARGET(op) Py_PRESERVE_NONE_CC PyObject *_TAIL_CALL_##op(TAIL_CALL_PARAMS)
+#   define TARGET(op) Py_PRESERVE_NONE_CC Py_SKIP_ZERO_USED_REGS PyObject *_TAIL_CALL_##op(TAIL_CALL_PARAMS)
 #   define DISPATCH_GOTO() \
         do { \

I also note that GCC, at least at present, does not zero registers on function exit due to a tail-call. My LLVM issue (llvm/llvm-project#129764) asks for LLVM to mirror that behavior, although note that there's currently also an issue on the GCC bugtracker asking for them to change that behavior.


On reflection my current opinion is there's probably nothing to be done in the CPython codebase, and this is more an issue for visibility for the CPython devs (and potentially CPython packagers) about a minor confounder when thinking about performance and hardening. But ultimately, of course, that's for you all to decide.

Metadata

Metadata

Assignees

No one assigned

    Labels

    buildThe build process and cross-buildinterpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagetype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions