Skip to content

Commit ad24cc3

Browse files
pablogsalgvanrossum
authored andcommitted
PEP 768: Safe external debugger interface for CPython (python#4158)
1 parent d38c76f commit ad24cc3

File tree

1 file changed

+351
-0
lines changed

1 file changed

+351
-0
lines changed

peps/pep-0768.rst

Lines changed: 351 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,351 @@
1+
PEP: 768
2+
Title: Safe external debugger interface for CPython
3+
Author: Pablo Galindo Salgado <[email protected]>, Matt Wozniski <[email protected]>, Ivona Stojanovic <[email protected]>
4+
Status: Draft
5+
Type: Standards Track
6+
Created: 25-Nov-2024
7+
Python-Version: 3.14
8+
9+
Abstract
10+
========
11+
12+
This PEP proposes adding a zero-overhead debugging interface to CPython that
13+
allows debuggers and profilers to safely attach to running Python processes. The
14+
interface provides safe execution points for attaching debugger code without
15+
modifying the interpreter's normal execution path or adding runtime overhead.
16+
17+
A key application of this interface will be enabling pdb to attach to live
18+
processes by process ID, similar to ``gdb -p``, allowing developers to inspect and
19+
debug Python applications interactively in real-time without stopping or
20+
restarting them.
21+
22+
Motivation
23+
==========
24+
25+
26+
Debugging Python processes in production and live environments presents unique
27+
challenges. Developers often need to analyze application behavior without
28+
stopping or restarting services, which is especially crucial for
29+
high-availability systems. Common scenarios include diagnosing deadlocks,
30+
inspecting memory usage, or investigating unexpected behavior in real-time.
31+
32+
Very few Python tools can attach to running processes, primarily because doing
33+
so requires deep expertise in both operating system debugging interfaces and
34+
CPython internals. While C/C++ debuggers like GDB and LLDB can attach to
35+
processes using well-understood techniques, Python tools must implement all of
36+
these low-level mechanisms plus handle additional complexity. For example, when
37+
GDB needs to execute code in a target process, it:
38+
39+
1. Uses ptrace to allocate a small chunk of executable memory (easier said than done)
40+
2. Writes a small sequence of machine code - typically a function prologue, the
41+
desired instructions, and code to restore registers
42+
3. Saves all the target thread's registers
43+
4. Changes the instruction pointer to the injected code
44+
5. Lets the process run until it hits a breakpoint at the end of the injected code
45+
6. Restores the original registers and continues execution
46+
47+
Python tools face this same challenge of code injection, but with an additional
48+
layer of complexity. Not only do they need to implement the above mechanism,
49+
they must also understand and safely interact with CPython's runtime state,
50+
including the interpreter loop, garbage collector, thread state, and reference
51+
counting system. This combination of low-level system manipulation and
52+
deep domain specific interpreter knowledge makes implementing Python debugging tools
53+
exceptionally difficult.
54+
55+
The few tools (see for example `DebugPy
56+
<https://github.com/microsoft/debugpy/blob/43f41029eabce338becbd1fa1a09727b3cfb1140/src/debugpy/_vendored/pydevd/pydevd_attach_to_process/linux_and_mac/attach.cpp#L4>`__
57+
and `Memray
58+
<https://github.com/bloomberg/memray/blob/main/src/memray/_memray/inject.cpp>`__)
59+
that do attempt this resort to suboptimal and unsafe methods,
60+
using system debuggers like GDB and LLDB to forcefully inject code. This
61+
approach is fundamentally unsafe because the injected code can execute at any
62+
point during the interpreter's execution cycle - even during critical operations
63+
like memory allocation, garbage collection, or thread state management. When
64+
this happens, the results are catastrophic: attempting to allocate memory while
65+
already inside ``malloc()`` causes crashes, modifying objects during garbage
66+
collection corrupts the interpreter's state, and touching thread state at the
67+
wrong time leads to deadlocks.
68+
69+
Various tools attempt to minimize these risks through complex workarounds, such
70+
as spawning separate threads for injected code or carefully timing their
71+
operations or trying to select some good points to stop the process. However,
72+
these mitigations cannot fully solve the underlying problem: without cooperation
73+
from the interpreter, there's no way to know if it's safe to execute code at any
74+
given moment. Even carefully implemented tools can crash the interpreter because
75+
they're fundamentally working against it rather than with it.
76+
77+
78+
Rationale
79+
=========
80+
81+
82+
Rather than forcing tools to work around interpreter limitations with unsafe
83+
code injection, we can extend CPython with a proper debugging interface that
84+
guarantees safe execution. By adding a few thread state fields and integrating
85+
with the interpreter's existing evaluation loop, we can ensure debugging
86+
operations only occur at well-defined safe points. This eliminates the
87+
possibility of crashes and corruption while maintaining zero overhead during
88+
normal execution.
89+
90+
The key insight is that we don't need to inject code at arbitrary points - we
91+
just need to signal to the interpreter that we want code executed at the next
92+
safe opportunity. This approach works with the interpreter's natural execution
93+
flow rather than fighting against it.
94+
95+
After describing this idea to the PyPy development team, this proposal has
96+
already `been implemented in PyPy <https://github.com/pypy/pypy/pull/5135>`__,
97+
proving both its feasibility and effectiveness. Their implementation
98+
demonstrates that we can provide safe debugging capabilities with zero runtime
99+
overhead during normal execution. The proposed mechanism not only reduces risks
100+
associated with current debugging approaches but also lays the foundation for
101+
future enhancements. For instance, this framework could enable integration with
102+
popular observability tools, providing real-time insights into interpreter
103+
performance or memory usage. One compelling use case for this interface is
104+
enabling pdb to attach to running Python processes, similar to how gdb allows
105+
users to attach to a program by process ID (``gdb -p <pid>``). With this
106+
feature, developers could inspect the state of a running application, evaluate
107+
expressions, and step through code dynamically. This approach would align
108+
Python's debugging capabilities with those of other major programming languages
109+
and debugging tools that support this mode.
110+
111+
Specification
112+
=============
113+
114+
115+
This proposal introduces a safe debugging mechanism that allows external
116+
processes to trigger code execution in a Python interpreter at well-defined safe
117+
points. The key insight is that rather than injecting code directly via system
118+
debuggers, we can leverage the interpreter's existing evaluation loop and thread
119+
state to coordinate debugging operations.
120+
121+
The mechanism works by having debuggers write to specific memory locations in
122+
the target process that the interpreter then checks during its normal execution
123+
cycle. When the interpreter detects that a debugger wants to attach, it executes the
124+
requested operations only when it's safe to do so - that is, when no internal
125+
locks are held and all data structures are in a consistent state.
126+
127+
128+
Runtime State Extensions
129+
------------------------
130+
131+
A new structure is added to PyThreadState to support remote debugging:
132+
133+
.. code-block:: C
134+
135+
typedef struct _remote_debugger_support {
136+
int debugger_pending_call;
137+
char debugger_script[MAX_SCRIPT_SIZE];
138+
} _PyRemoteDebuggerSupport;
139+
140+
141+
This structure is appended to ``PyThreadState``, adding only a few fields that
142+
are **never accessed during normal execution**. The ``debugger_pending_call`` field
143+
indicates when a debugger has requested execution, while ``debugger_script``
144+
provides Python code to be executed when the interpreter reaches a safe point.
145+
146+
147+
Debug Offsets Table
148+
-------------------
149+
150+
151+
Python 3.12 introduced a debug offsets table placed at the start of the
152+
PyRuntime structure. This section contains the ``_Py_DebugOffsets`` structure that
153+
allows external tools to reliably find critical runtime structures regardless of
154+
`ASLR <https://en.wikipedia.org/wiki/Address_space_layout_randomization>`__ or
155+
how Python was compiled.
156+
157+
This proposal extends the existing debug offsets table with new fields for
158+
debugger support:
159+
160+
.. code-block:: C
161+
162+
struct _debugger_support {
163+
uint64_t eval_breaker; // Location of the eval breaker flag
164+
uint64_t remote_debugger_support; // Offset to our support structure
165+
uint64_t debugger_pending_call; // Where to write the pending flag
166+
uint64_t debugger_script; // Where to write the script
167+
} debugger_support;
168+
169+
These offsets allow debuggers to locate critical debugging control structures in
170+
the target process's memory space. The ``eval_breaker`` and ``remote_debugger_support``
171+
offsets are relative to each ``PyThreadState``, while the ``debugger_pending_call``
172+
and ``debugger_script`` offsets are relative to each ``_PyRemoteDebuggerSupport``
173+
structure, allowing the new structure and its fields to be found regardless of
174+
where they are in memory.
175+
176+
Attachment Protocol
177+
-------------------
178+
When a debugger wants to attach to a Python process, it follows these steps:
179+
180+
1. Locate ``PyRuntime`` structure in the process:
181+
182+
- Find Python binary (executable or libpython) in process memory (OS dependent process)
183+
- Extract ``.PyRuntime`` section offset from binary's format (ELF/Mach-O/PE)
184+
- Calculate the actual ``PyRuntime`` address in the running process by relocating the offset to the binary's load address
185+
186+
2. Access debug offset information by reading the ``_Py_DebugOffsets`` at the start of the ``PyRuntime`` structure.
187+
188+
3. Use the offsets to locate the desired thread state
189+
190+
4. Use the offsets to locate the debugger interface fields within that thread state
191+
192+
5. Write control information:
193+
194+
- Write python code to be executed into the ``debugger_script`` field in ``_PyRemoteDebuggerSupport``
195+
- Set ``debugger_pending_call`` flag in ``_PyRemoteDebuggerSupport``
196+
- Set ``_PY_EVAL_PLEASE_STOP_BIT`` in the ``eval_breaker`` field
197+
198+
Once the interpreter reaches the next safe point, it will execute the script
199+
provided by the debugger.
200+
201+
Interpreter Integration
202+
-----------------------
203+
204+
The interpreter's regular evaluation loop already includes a check of the
205+
``eval_breaker`` flag for handling signals, periodic tasks, and other interrupts. We
206+
leverage this existing mechanism by checking for debugger pending calls only
207+
when the ``eval_breaker`` is set, ensuring zero overhead during normal execution.
208+
This check has no overhead. Indeed, profiling with Linux ``perf`` shows this branch
209+
is highly predictable - the ``debugger_pending_call`` check is never taken during
210+
normal execution, allowing modern CPUs to effectively speculate past it.
211+
212+
213+
When a debugger has set both the ``eval_breaker`` flag and ``debugger_pending_call``,
214+
the interpreter will execute the provided debugging code at the next safe point
215+
and executes the provided code. This all happens in a completely safe context, since
216+
the interpreter is guaranteed to be in a consistent state whenever the eval breaker
217+
is checked.
218+
219+
.. code-block:: c
220+
221+
// In ceval.c
222+
if (tstate->eval_breaker) {
223+
if (tstate->remote_debugger_support.debugger_pending_call) {
224+
tstate->remote_debugger_support.debugger_pending_call = 0;
225+
if (tstate->remote_debugger_support.debugger_script[0]) {
226+
if (PyRun_SimpleString(tstate->remote_debugger_support.debugger_script)<0) {
227+
PyErr_Clear();
228+
};
229+
// ...
230+
}
231+
}
232+
}
233+
234+
235+
Python API
236+
----------
237+
238+
To support safe execution of Python code in a remote process without having to
239+
re-implement all these steps in every tool, this proposal extends the ``sys`` module
240+
with a new function. This function allows debuggers or external tools to execute
241+
arbitrary Python code within the context of a specified Python process:
242+
243+
.. code-block:: python
244+
245+
def remote_exec(pid: int, code: str) -> None:
246+
"""
247+
Executes a block of Python code in a given remote Python process.
248+
249+
Args:
250+
pid (int): The process ID of the target Python process.
251+
code (str): A string containing the Python code to be executed.
252+
"""
253+
254+
An example usage of the API would look like:
255+
256+
.. code-block:: python
257+
258+
import sys
259+
# Execute a print statement in a remote Python process with PID 12345
260+
try:
261+
sys.remote_exec(12345, "print('Hello from remote execution!')")
262+
except Exception as e:
263+
print(f"Failed to execute code: {e}")
264+
265+
266+
Backwards Compatibility
267+
=======================
268+
269+
This change has no impact on existing Python code or interpreter performance.
270+
The added fields are only accessed during debugger attachment, and the checking
271+
mechanism piggybacks on existing interpreter safe points.
272+
273+
274+
Security Implications
275+
=====================
276+
277+
This interface does not introduce new security concerns as it relies entirely on
278+
existing operating system security mechanisms for process memory access. Although
279+
the PEP doesn't specify how memory should be written to the target process, in practice
280+
this will be done using standard system calls that are already being used by other
281+
debuggers and tools. Some examples are:
282+
283+
* On Linux, the ``process_vm_readv()`` and ``process_vm_writev()`` system calls
284+
are used to read and write memory from another process. These operations are
285+
controlled by ptrace access mode checks - the same ones that govern debugger
286+
attachment. A process can only read from or write to another process's memory
287+
if it has the appropriate permissions (typically requiring either root or the
288+
``CAP_SYS_PTRACE`` capability, though less security minded distributions may
289+
allow any process running as the same uid to attach).
290+
291+
* On macOS, the interface would leverage ``mach_vm_read_overwrite()`` and
292+
``mach_vm_write()`` through the Mach task system. These operations require
293+
``task_for_pid()`` access, which is strictly controlled by the operating
294+
system. By default, access is limited to processes running as root or those
295+
with specific entitlements granted by Apple's security framework.
296+
297+
* On Windows, the ``ReadProcessMemory()`` and ``WriteProcessMemory()`` functions
298+
provide similar functionality. Access is controlled through the Windows
299+
security model - a process needs ``PROCESS_VM_READ`` and ``PROCESS_VM_WRITE``
300+
permissions, which typically require the same user context or appropriate
301+
privileges. These are the same permissions required by debuggers, ensuring
302+
consistent security semantics across platforms.
303+
304+
All mechanisms ensure that:
305+
306+
1. Only authorized processes can read/write memory
307+
2. The same security model that governs traditional debugger attachment applies
308+
3. No additional attack surface is exposed beyond what the OS already provides for debugging
309+
310+
The memory operations themselves are well-established and have been used safely
311+
for decades in tools like GDB, LLDB, and various system profilers.
312+
313+
It’s important to note that any attempt to attach to a Python process via this
314+
mechanism would be detectable by system-level monitoring tools. This
315+
transparency provides an additional layer of accountability, allowing
316+
administrators to audit debugging operations in sensitive environments.
317+
318+
Further, the strict reliance on OS-level security controls ensures that existing
319+
system policies remain effective. For enterprise environments, this means
320+
administrators can continue to enforce debugging restrictions using standard
321+
tools and policies without requiring additional configuration. For instance,
322+
leveraging Linux’s ``ptrace_scope`` or macOS’s ``taskgated`` to restrict
323+
debugger access will equally govern the proposed interface.
324+
325+
By maintaining compatibility with existing security frameworks, this design
326+
ensures that adopting the new interface requires no changes to established
327+
security practices, thereby minimizing barriers to adoption.
328+
329+
How to Teach This
330+
=================
331+
332+
For tool authors, this interface becomes the standard way to implement debugger
333+
attachment, replacing unsafe system debugger approaches. A section in the Python
334+
Developer Guide could describe the internal workings of the mechanism, including
335+
the ``debugger_support`` offsets and how to interact with them using system
336+
APIs.
337+
338+
End users need not be aware of the interface, benefiting only from improved
339+
debugging tool stability and reliability.
340+
341+
Reference Implementation
342+
========================
343+
344+
https://github.com/pablogsal/cpython/commits/remote_pdb/
345+
346+
347+
Copyright
348+
=========
349+
350+
This document is placed in the public domain or under the CC0-1.0-Universal
351+
license, whichever is more permissive.

0 commit comments

Comments
 (0)