|
| 1 | +PEP: 768 |
| 2 | +Title: Safe external debugger interface for CPython |
| 3 | +Author: Pablo Galindo Salgado < [email protected]>, Matt Wozniski < [email protected]>, Ivona Stojanovic < [email protected]> |
| 4 | +Status: Draft |
| 5 | +Type: Standards Track |
| 6 | +Created: 25-Nov-2024 |
| 7 | +Python-Version: 3.14 |
| 8 | + |
| 9 | +Abstract |
| 10 | +======== |
| 11 | + |
| 12 | +This PEP proposes adding a zero-overhead debugging interface to CPython that |
| 13 | +allows debuggers and profilers to safely attach to running Python processes. The |
| 14 | +interface provides safe execution points for attaching debugger code without |
| 15 | +modifying the interpreter's normal execution path or adding runtime overhead. |
| 16 | + |
| 17 | +A key application of this interface will be enabling pdb to attach to live |
| 18 | +processes by process ID, similar to ``gdb -p``, allowing developers to inspect and |
| 19 | +debug Python applications interactively in real-time without stopping or |
| 20 | +restarting them. |
| 21 | + |
| 22 | +Motivation |
| 23 | +========== |
| 24 | + |
| 25 | + |
| 26 | +Debugging Python processes in production and live environments presents unique |
| 27 | +challenges. Developers often need to analyze application behavior without |
| 28 | +stopping or restarting services, which is especially crucial for |
| 29 | +high-availability systems. Common scenarios include diagnosing deadlocks, |
| 30 | +inspecting memory usage, or investigating unexpected behavior in real-time. |
| 31 | + |
| 32 | +Very few Python tools can attach to running processes, primarily because doing |
| 33 | +so requires deep expertise in both operating system debugging interfaces and |
| 34 | +CPython internals. While C/C++ debuggers like GDB and LLDB can attach to |
| 35 | +processes using well-understood techniques, Python tools must implement all of |
| 36 | +these low-level mechanisms plus handle additional complexity. For example, when |
| 37 | +GDB needs to execute code in a target process, it: |
| 38 | + |
| 39 | +1. Uses ptrace to allocate a small chunk of executable memory (easier said than done) |
| 40 | +2. Writes a small sequence of machine code - typically a function prologue, the |
| 41 | + desired instructions, and code to restore registers |
| 42 | +3. Saves all the target thread's registers |
| 43 | +4. Changes the instruction pointer to the injected code |
| 44 | +5. Lets the process run until it hits a breakpoint at the end of the injected code |
| 45 | +6. Restores the original registers and continues execution |
| 46 | + |
| 47 | +Python tools face this same challenge of code injection, but with an additional |
| 48 | +layer of complexity. Not only do they need to implement the above mechanism, |
| 49 | +they must also understand and safely interact with CPython's runtime state, |
| 50 | +including the interpreter loop, garbage collector, thread state, and reference |
| 51 | +counting system. This combination of low-level system manipulation and |
| 52 | +deep domain specific interpreter knowledge makes implementing Python debugging tools |
| 53 | +exceptionally difficult. |
| 54 | + |
| 55 | +The few tools (see for example `DebugPy |
| 56 | +<https://github.com/microsoft/debugpy/blob/43f41029eabce338becbd1fa1a09727b3cfb1140/src/debugpy/_vendored/pydevd/pydevd_attach_to_process/linux_and_mac/attach.cpp#L4>`__ |
| 57 | +and `Memray |
| 58 | +<https://github.com/bloomberg/memray/blob/main/src/memray/_memray/inject.cpp>`__) |
| 59 | +that do attempt this resort to suboptimal and unsafe methods, |
| 60 | +using system debuggers like GDB and LLDB to forcefully inject code. This |
| 61 | +approach is fundamentally unsafe because the injected code can execute at any |
| 62 | +point during the interpreter's execution cycle - even during critical operations |
| 63 | +like memory allocation, garbage collection, or thread state management. When |
| 64 | +this happens, the results are catastrophic: attempting to allocate memory while |
| 65 | +already inside ``malloc()`` causes crashes, modifying objects during garbage |
| 66 | +collection corrupts the interpreter's state, and touching thread state at the |
| 67 | +wrong time leads to deadlocks. |
| 68 | + |
| 69 | +Various tools attempt to minimize these risks through complex workarounds, such |
| 70 | +as spawning separate threads for injected code or carefully timing their |
| 71 | +operations or trying to select some good points to stop the process. However, |
| 72 | +these mitigations cannot fully solve the underlying problem: without cooperation |
| 73 | +from the interpreter, there's no way to know if it's safe to execute code at any |
| 74 | +given moment. Even carefully implemented tools can crash the interpreter because |
| 75 | +they're fundamentally working against it rather than with it. |
| 76 | + |
| 77 | + |
| 78 | +Rationale |
| 79 | +========= |
| 80 | + |
| 81 | + |
| 82 | +Rather than forcing tools to work around interpreter limitations with unsafe |
| 83 | +code injection, we can extend CPython with a proper debugging interface that |
| 84 | +guarantees safe execution. By adding a few thread state fields and integrating |
| 85 | +with the interpreter's existing evaluation loop, we can ensure debugging |
| 86 | +operations only occur at well-defined safe points. This eliminates the |
| 87 | +possibility of crashes and corruption while maintaining zero overhead during |
| 88 | +normal execution. |
| 89 | + |
| 90 | +The key insight is that we don't need to inject code at arbitrary points - we |
| 91 | +just need to signal to the interpreter that we want code executed at the next |
| 92 | +safe opportunity. This approach works with the interpreter's natural execution |
| 93 | +flow rather than fighting against it. |
| 94 | + |
| 95 | +After describing this idea to the PyPy development team, this proposal has |
| 96 | +already `been implemented in PyPy <https://github.com/pypy/pypy/pull/5135>`__, |
| 97 | +proving both its feasibility and effectiveness. Their implementation |
| 98 | +demonstrates that we can provide safe debugging capabilities with zero runtime |
| 99 | +overhead during normal execution. The proposed mechanism not only reduces risks |
| 100 | +associated with current debugging approaches but also lays the foundation for |
| 101 | +future enhancements. For instance, this framework could enable integration with |
| 102 | +popular observability tools, providing real-time insights into interpreter |
| 103 | +performance or memory usage. One compelling use case for this interface is |
| 104 | +enabling pdb to attach to running Python processes, similar to how gdb allows |
| 105 | +users to attach to a program by process ID (``gdb -p <pid>``). With this |
| 106 | +feature, developers could inspect the state of a running application, evaluate |
| 107 | +expressions, and step through code dynamically. This approach would align |
| 108 | +Python's debugging capabilities with those of other major programming languages |
| 109 | +and debugging tools that support this mode. |
| 110 | + |
| 111 | +Specification |
| 112 | +============= |
| 113 | + |
| 114 | + |
| 115 | +This proposal introduces a safe debugging mechanism that allows external |
| 116 | +processes to trigger code execution in a Python interpreter at well-defined safe |
| 117 | +points. The key insight is that rather than injecting code directly via system |
| 118 | +debuggers, we can leverage the interpreter's existing evaluation loop and thread |
| 119 | +state to coordinate debugging operations. |
| 120 | + |
| 121 | +The mechanism works by having debuggers write to specific memory locations in |
| 122 | +the target process that the interpreter then checks during its normal execution |
| 123 | +cycle. When the interpreter detects that a debugger wants to attach, it executes the |
| 124 | +requested operations only when it's safe to do so - that is, when no internal |
| 125 | +locks are held and all data structures are in a consistent state. |
| 126 | + |
| 127 | + |
| 128 | +Runtime State Extensions |
| 129 | +------------------------ |
| 130 | + |
| 131 | +A new structure is added to PyThreadState to support remote debugging: |
| 132 | + |
| 133 | +.. code-block:: C |
| 134 | +
|
| 135 | + typedef struct _remote_debugger_support { |
| 136 | + int debugger_pending_call; |
| 137 | + char debugger_script[MAX_SCRIPT_SIZE]; |
| 138 | + } _PyRemoteDebuggerSupport; |
| 139 | +
|
| 140 | +
|
| 141 | +This structure is appended to ``PyThreadState``, adding only a few fields that |
| 142 | +are **never accessed during normal execution**. The ``debugger_pending_call`` field |
| 143 | +indicates when a debugger has requested execution, while ``debugger_script`` |
| 144 | +provides Python code to be executed when the interpreter reaches a safe point. |
| 145 | + |
| 146 | + |
| 147 | +Debug Offsets Table |
| 148 | +------------------- |
| 149 | + |
| 150 | + |
| 151 | +Python 3.12 introduced a debug offsets table placed at the start of the |
| 152 | +PyRuntime structure. This section contains the ``_Py_DebugOffsets`` structure that |
| 153 | +allows external tools to reliably find critical runtime structures regardless of |
| 154 | +`ASLR <https://en.wikipedia.org/wiki/Address_space_layout_randomization>`__ or |
| 155 | +how Python was compiled. |
| 156 | + |
| 157 | +This proposal extends the existing debug offsets table with new fields for |
| 158 | +debugger support: |
| 159 | + |
| 160 | +.. code-block:: C |
| 161 | +
|
| 162 | + struct _debugger_support { |
| 163 | + uint64_t eval_breaker; // Location of the eval breaker flag |
| 164 | + uint64_t remote_debugger_support; // Offset to our support structure |
| 165 | + uint64_t debugger_pending_call; // Where to write the pending flag |
| 166 | + uint64_t debugger_script; // Where to write the script |
| 167 | + } debugger_support; |
| 168 | +
|
| 169 | +These offsets allow debuggers to locate critical debugging control structures in |
| 170 | +the target process's memory space. The ``eval_breaker`` and ``remote_debugger_support`` |
| 171 | +offsets are relative to each ``PyThreadState``, while the ``debugger_pending_call`` |
| 172 | +and ``debugger_script`` offsets are relative to each ``_PyRemoteDebuggerSupport`` |
| 173 | +structure, allowing the new structure and its fields to be found regardless of |
| 174 | +where they are in memory. |
| 175 | + |
| 176 | +Attachment Protocol |
| 177 | +------------------- |
| 178 | +When a debugger wants to attach to a Python process, it follows these steps: |
| 179 | + |
| 180 | +1. Locate ``PyRuntime`` structure in the process: |
| 181 | + |
| 182 | + - Find Python binary (executable or libpython) in process memory (OS dependent process) |
| 183 | + - Extract ``.PyRuntime`` section offset from binary's format (ELF/Mach-O/PE) |
| 184 | + - Calculate the actual ``PyRuntime`` address in the running process by relocating the offset to the binary's load address |
| 185 | + |
| 186 | +2. Access debug offset information by reading the ``_Py_DebugOffsets`` at the start of the ``PyRuntime`` structure. |
| 187 | + |
| 188 | +3. Use the offsets to locate the desired thread state |
| 189 | + |
| 190 | +4. Use the offsets to locate the debugger interface fields within that thread state |
| 191 | + |
| 192 | +5. Write control information: |
| 193 | + |
| 194 | + - Write python code to be executed into the ``debugger_script`` field in ``_PyRemoteDebuggerSupport`` |
| 195 | + - Set ``debugger_pending_call`` flag in ``_PyRemoteDebuggerSupport`` |
| 196 | + - Set ``_PY_EVAL_PLEASE_STOP_BIT`` in the ``eval_breaker`` field |
| 197 | + |
| 198 | +Once the interpreter reaches the next safe point, it will execute the script |
| 199 | +provided by the debugger. |
| 200 | + |
| 201 | +Interpreter Integration |
| 202 | +----------------------- |
| 203 | + |
| 204 | +The interpreter's regular evaluation loop already includes a check of the |
| 205 | +``eval_breaker`` flag for handling signals, periodic tasks, and other interrupts. We |
| 206 | +leverage this existing mechanism by checking for debugger pending calls only |
| 207 | +when the ``eval_breaker`` is set, ensuring zero overhead during normal execution. |
| 208 | +This check has no overhead. Indeed, profiling with Linux ``perf`` shows this branch |
| 209 | +is highly predictable - the ``debugger_pending_call`` check is never taken during |
| 210 | +normal execution, allowing modern CPUs to effectively speculate past it. |
| 211 | + |
| 212 | + |
| 213 | +When a debugger has set both the ``eval_breaker`` flag and ``debugger_pending_call``, |
| 214 | +the interpreter will execute the provided debugging code at the next safe point |
| 215 | +and executes the provided code. This all happens in a completely safe context, since |
| 216 | +the interpreter is guaranteed to be in a consistent state whenever the eval breaker |
| 217 | +is checked. |
| 218 | + |
| 219 | +.. code-block:: c |
| 220 | +
|
| 221 | + // In ceval.c |
| 222 | + if (tstate->eval_breaker) { |
| 223 | + if (tstate->remote_debugger_support.debugger_pending_call) { |
| 224 | + tstate->remote_debugger_support.debugger_pending_call = 0; |
| 225 | + if (tstate->remote_debugger_support.debugger_script[0]) { |
| 226 | + if (PyRun_SimpleString(tstate->remote_debugger_support.debugger_script)<0) { |
| 227 | + PyErr_Clear(); |
| 228 | + }; |
| 229 | + // ... |
| 230 | + } |
| 231 | + } |
| 232 | + } |
| 233 | +
|
| 234 | +
|
| 235 | +Python API |
| 236 | +---------- |
| 237 | + |
| 238 | +To support safe execution of Python code in a remote process without having to |
| 239 | +re-implement all these steps in every tool, this proposal extends the ``sys`` module |
| 240 | +with a new function. This function allows debuggers or external tools to execute |
| 241 | +arbitrary Python code within the context of a specified Python process: |
| 242 | + |
| 243 | +.. code-block:: python |
| 244 | +
|
| 245 | + def remote_exec(pid: int, code: str) -> None: |
| 246 | + """ |
| 247 | + Executes a block of Python code in a given remote Python process. |
| 248 | +
|
| 249 | + Args: |
| 250 | + pid (int): The process ID of the target Python process. |
| 251 | + code (str): A string containing the Python code to be executed. |
| 252 | + """ |
| 253 | +
|
| 254 | +An example usage of the API would look like: |
| 255 | + |
| 256 | +.. code-block:: python |
| 257 | +
|
| 258 | + import sys |
| 259 | + # Execute a print statement in a remote Python process with PID 12345 |
| 260 | + try: |
| 261 | + sys.remote_exec(12345, "print('Hello from remote execution!')") |
| 262 | + except Exception as e: |
| 263 | + print(f"Failed to execute code: {e}") |
| 264 | +
|
| 265 | +
|
| 266 | +Backwards Compatibility |
| 267 | +======================= |
| 268 | + |
| 269 | +This change has no impact on existing Python code or interpreter performance. |
| 270 | +The added fields are only accessed during debugger attachment, and the checking |
| 271 | +mechanism piggybacks on existing interpreter safe points. |
| 272 | + |
| 273 | + |
| 274 | +Security Implications |
| 275 | +===================== |
| 276 | + |
| 277 | +This interface does not introduce new security concerns as it relies entirely on |
| 278 | +existing operating system security mechanisms for process memory access. Although |
| 279 | +the PEP doesn't specify how memory should be written to the target process, in practice |
| 280 | +this will be done using standard system calls that are already being used by other |
| 281 | +debuggers and tools. Some examples are: |
| 282 | + |
| 283 | +* On Linux, the ``process_vm_readv()`` and ``process_vm_writev()`` system calls |
| 284 | + are used to read and write memory from another process. These operations are |
| 285 | + controlled by ptrace access mode checks - the same ones that govern debugger |
| 286 | + attachment. A process can only read from or write to another process's memory |
| 287 | + if it has the appropriate permissions (typically requiring either root or the |
| 288 | + ``CAP_SYS_PTRACE`` capability, though less security minded distributions may |
| 289 | + allow any process running as the same uid to attach). |
| 290 | + |
| 291 | +* On macOS, the interface would leverage ``mach_vm_read_overwrite()`` and |
| 292 | + ``mach_vm_write()`` through the Mach task system. These operations require |
| 293 | + ``task_for_pid()`` access, which is strictly controlled by the operating |
| 294 | + system. By default, access is limited to processes running as root or those |
| 295 | + with specific entitlements granted by Apple's security framework. |
| 296 | + |
| 297 | +* On Windows, the ``ReadProcessMemory()`` and ``WriteProcessMemory()`` functions |
| 298 | + provide similar functionality. Access is controlled through the Windows |
| 299 | + security model - a process needs ``PROCESS_VM_READ`` and ``PROCESS_VM_WRITE`` |
| 300 | + permissions, which typically require the same user context or appropriate |
| 301 | + privileges. These are the same permissions required by debuggers, ensuring |
| 302 | + consistent security semantics across platforms. |
| 303 | + |
| 304 | +All mechanisms ensure that: |
| 305 | + |
| 306 | +1. Only authorized processes can read/write memory |
| 307 | +2. The same security model that governs traditional debugger attachment applies |
| 308 | +3. No additional attack surface is exposed beyond what the OS already provides for debugging |
| 309 | + |
| 310 | +The memory operations themselves are well-established and have been used safely |
| 311 | +for decades in tools like GDB, LLDB, and various system profilers. |
| 312 | + |
| 313 | +It’s important to note that any attempt to attach to a Python process via this |
| 314 | +mechanism would be detectable by system-level monitoring tools. This |
| 315 | +transparency provides an additional layer of accountability, allowing |
| 316 | +administrators to audit debugging operations in sensitive environments. |
| 317 | + |
| 318 | +Further, the strict reliance on OS-level security controls ensures that existing |
| 319 | +system policies remain effective. For enterprise environments, this means |
| 320 | +administrators can continue to enforce debugging restrictions using standard |
| 321 | +tools and policies without requiring additional configuration. For instance, |
| 322 | +leveraging Linux’s ``ptrace_scope`` or macOS’s ``taskgated`` to restrict |
| 323 | +debugger access will equally govern the proposed interface. |
| 324 | + |
| 325 | +By maintaining compatibility with existing security frameworks, this design |
| 326 | +ensures that adopting the new interface requires no changes to established |
| 327 | +security practices, thereby minimizing barriers to adoption. |
| 328 | + |
| 329 | +How to Teach This |
| 330 | +================= |
| 331 | + |
| 332 | +For tool authors, this interface becomes the standard way to implement debugger |
| 333 | +attachment, replacing unsafe system debugger approaches. A section in the Python |
| 334 | +Developer Guide could describe the internal workings of the mechanism, including |
| 335 | +the ``debugger_support`` offsets and how to interact with them using system |
| 336 | +APIs. |
| 337 | + |
| 338 | +End users need not be aware of the interface, benefiting only from improved |
| 339 | +debugging tool stability and reliability. |
| 340 | + |
| 341 | +Reference Implementation |
| 342 | +======================== |
| 343 | + |
| 344 | +https://github.com/pablogsal/cpython/commits/remote_pdb/ |
| 345 | + |
| 346 | + |
| 347 | +Copyright |
| 348 | +========= |
| 349 | + |
| 350 | +This document is placed in the public domain or under the CC0-1.0-Universal |
| 351 | +license, whichever is more permissive. |
0 commit comments