Description
I'm not certain any action is appropriate here, but I'm cross-posting to the Python bugtracker at @Fidget-Spinner's request, and for visibility. Much of this text is copied from my issue on LLVM.
GCC and clang support the -fzero-call-used-regs
option / zero_call_used_regs
attribute. This option causes the compiler to zero out certain registers on exit from a function, to mitigate ROP and partially protect against certain architectural side-channels.
This option is not widely used at present, but some distributions enable it by default, such as NixOS.
The tail-call interpreter emits a separate function per opcode implementation. This means that -- when compiled using clang
and -fzero-call-used-regs
-- each opcode emits some additional xor
instructions to zero-out certain registers, prior to the tail dispatch. The computed goto interpreter does not get these instructions, even though it emits very similar indirect jumps, which could potentially be used for similar JOP gadgets.
In my experiments, I see a ~2% hit from -fzero-call-used-regs
with computed gotos, and ~3% with tail-calls. However, the tail-call hit can be substantially reduced (to ~1%, in my testing) by skipping zeroing only for the bytecode functions, using __attribute__((zero_call_used_regs("skip")))
. (Note that I don't trust my benchmark setup to have stability at the ~1% scale, so take these exact numbers with a grain of salt.) I tested using this patch:
diff --git i/Python/ceval_macros.h w/Python/ceval_macros.h
index 1bef2b845d0..cddad845fea 100644
--- i/Python/ceval_macros.h
+++ w/Python/ceval_macros.h
@@ -77,9 +77,10 @@
// Note: [[clang::musttail]] works for GCC 15, but not __attribute__((musttail)) at the moment.
# define Py_MUSTTAIL [[clang::musttail]]
# define Py_PRESERVE_NONE_CC __attribute__((preserve_none))
+# define Py_SKIP_ZERO_USED_REGS __attribute__((zero_call_used_regs("skip")))
Py_PRESERVE_NONE_CC typedef PyObject* (*py_tail_call_funcptr)(TAIL_CALL_PARAMS);
-# define TARGET(op) Py_PRESERVE_NONE_CC PyObject *_TAIL_CALL_##op(TAIL_CALL_PARAMS)
+# define TARGET(op) Py_PRESERVE_NONE_CC Py_SKIP_ZERO_USED_REGS PyObject *_TAIL_CALL_##op(TAIL_CALL_PARAMS)
# define DISPATCH_GOTO() \
do { \
I also note that GCC, at least at present, does not zero registers on function exit due to a tail-call. My LLVM issue (llvm/llvm-project#129764) asks for LLVM to mirror that behavior, although note that there's currently also an issue on the GCC bugtracker asking for them to change that behavior.
On reflection my current opinion is there's probably nothing to be done in the CPython codebase, and this is more an issue for visibility for the CPython devs (and potentially CPython packagers) about a minor confounder when thinking about performance and hardening. But ultimately, of course, that's for you all to decide.