|
| 1 | +.. _remote-debugging: |
| 2 | + |
| 3 | +Remote Debugging Attachment Protocol |
| 4 | +==================================== |
| 5 | + |
| 6 | +This section explains the low-level protocol that allows external code to inject and execute |
| 7 | +a Python script inside a running CPython process. |
| 8 | + |
| 9 | +This is the mechanism implemented by the :func:`sys.remote_exec` function, which |
| 10 | +instructs a remote Python process to execute a ``.py`` file. This section is not about using that |
| 11 | +function, instead, it explains how the underlying protocol works so that it can be |
| 12 | +reimplemented in any language. |
| 13 | + |
| 14 | +The protocol assumes you already know the process you want to target and the code you want it to run. |
| 15 | +That’s why it takes two pieces of information: |
| 16 | + |
| 17 | +- The process ID (``pid``) of the Python process you want to interact with. |
| 18 | +- A path to a Python script file (``.py``) that contains the code to be executed. |
| 19 | + |
| 20 | +Once injected, the script is executed by the target process’s interpreter the next time it reaches |
| 21 | +a safe evaluation point. This allows tools to trigger |
| 22 | +code execution remotely without modifying the Python program itself. |
| 23 | + |
| 24 | +In the sections that follow, we’ll walk through each step of this protocol in detail: how to locate |
| 25 | +the interpreter in memory, how to access internal structures safely, and how to trigger the execution |
| 26 | +of your script. Where necessary, we’ll highlight differences across platforms (Linux, macOS, Windows), |
| 27 | +and include example code to help clarify each part of the process. |
| 28 | + |
| 29 | +Locating the PyRuntime Structure |
| 30 | +================================ |
| 31 | + |
| 32 | +The ``PyRuntime`` structure holds CPython's global interpreter state and serves as |
| 33 | +the entry point to other internal data, including the list of interpreters, |
| 34 | +thread states, and debugger support fields. |
| 35 | + |
| 36 | +To interact with a remote Python process, a debugger must first compute the memory |
| 37 | +address of the ``PyRuntime`` structure inside the target process. This cannot be |
| 38 | +hardcoded or inferred symbolically, since its location depends on how the binary was |
| 39 | +mapped into memory by the operating system. |
| 40 | + |
| 41 | +The process for locating ``PyRuntime`` is platform-specific, but follows the same |
| 42 | +high-level approach: |
| 43 | + |
| 44 | +1. Identify where the Python executable or shared library was loaded in the target process. |
| 45 | +2. Parse the corresponding binary file on disk to find the offset of the |
| 46 | + ``.PyRuntime`` section. |
| 47 | +3. Compute the in-memory address of ``PyRuntime`` by relocating the section offset |
| 48 | + to the base address found in step 1. |
| 49 | + |
| 50 | +Each subsection below explains what must be done and provides a short example of how this |
| 51 | +can be implemented. |
| 52 | + |
| 53 | +.. rubric:: Linux (ELF) |
| 54 | + |
| 55 | +To locate the ``PyRuntime`` structure on Linux: |
| 56 | + |
| 57 | +1. Inspect the memory mappings of the target process (e.g. from ``/proc/<pid>/maps``) |
| 58 | + to find the memory region where the Python executable or shared ``libpython`` |
| 59 | + library is loaded. Record its base address. |
| 60 | +2. Load the binary file from disk and parse its ELF section headers. |
| 61 | + Locate the ``.PyRuntime`` section and determine its file offset. |
| 62 | +3. Add the section offset to the base address to compute the address of the |
| 63 | + ``PyRuntime`` structure in memory. |
| 64 | + |
| 65 | +An example implementation might look like: |
| 66 | + |
| 67 | +.. code-block:: python |
| 68 | +
|
| 69 | + def find_py_runtime_linux(pid): |
| 70 | + # Step 1: Try to find the Python executable in memory |
| 71 | + binary_path, base_address = find_mapped_binary(pid, name_contains="python") |
| 72 | + # Step 2: Fallback to shared library if executable is not found |
| 73 | + if binary_path is None: |
| 74 | + binary_path, base_address = find_mapped_binary(pid, name_contains="libpython") |
| 75 | + # Step 3: Parse ELF headers of the binary to get .PyRuntime section offset |
| 76 | + section_offset = parse_elf_section_offset(binary_path, ".PyRuntime") |
| 77 | + # Step 4: Compute PyRuntime address in memory |
| 78 | + return base_address + section_offset |
| 79 | +
|
| 80 | +.. rubric:: macOS (Mach-O) |
| 81 | + |
| 82 | +To locate the ``PyRuntime`` structure on macOS: |
| 83 | + |
| 84 | +1. Obtain a handle to the target process that allows memory inspection. |
| 85 | +2. Walk the memory regions of the process to identify the one that contains the |
| 86 | + Python binary or shared library. Record its base address and associated file path. |
| 87 | +3. Load that binary file from disk and parse the Mach-O headers to find the |
| 88 | + ``__DATA,__PyRuntime`` section. |
| 89 | +4. Add the section's offset to the base address of the loaded binary to compute |
| 90 | + the address of the ``PyRuntime`` structure. |
| 91 | + |
| 92 | +An example implementation might look like: |
| 93 | + |
| 94 | +.. code-block:: python |
| 95 | +
|
| 96 | + def find_py_runtime_macos(pid): |
| 97 | + # Step 1: Get access to the process's memory |
| 98 | + handle = get_memory_access_handle(pid) |
| 99 | + # Step 2: Try to find the Python executable in memory |
| 100 | + binary_path, base_address = find_mapped_binary(handle, name_contains="python") |
| 101 | + # Step 3: Fallback to libpython if executable is not found |
| 102 | + if binary_path is None: |
| 103 | + binary_path, base_address = find_mapped_binary(handle, name_contains="libpython") |
| 104 | + # Step 4: Parse Mach-O headers to get __DATA,__PyRuntime section offset |
| 105 | + section_offset = parse_macho_section_offset(binary_path, "__DATA", "__PyRuntime") |
| 106 | + # Step 5: Compute PyRuntime address in memory |
| 107 | + return base_address + section_offset |
| 108 | +
|
| 109 | +.. rubric:: Windows (PE) |
| 110 | + |
| 111 | +To locate the ``PyRuntime`` structure on Windows: |
| 112 | + |
| 113 | +1. Enumerate all modules loaded in the target process. |
| 114 | + Identify the module corresponding to ``python.exe`` or ``pythonXY.dll``, where X and Y |
| 115 | + are the major and minor version numbers of the Python version, and record its base address. |
| 116 | +2. Load the binary from disk and parse the PE section headers. |
| 117 | + Locate the ``.PyRuntime`` section and determine its relative virtual address (RVA). |
| 118 | +3. Add the RVA to the module’s base address to compute the full in-memory address |
| 119 | + of the ``PyRuntime`` structure. |
| 120 | + |
| 121 | +An example implementation might look like: |
| 122 | + |
| 123 | +.. code-block:: python |
| 124 | +
|
| 125 | + def find_py_runtime_windows(pid): |
| 126 | + # Step 1: Try to find the Python executable in memory |
| 127 | + binary_path, base_address = find_loaded_module(pid, name_contains="python") |
| 128 | + # Step 2: Fallback to shared pythonXY.dll if executable is not found |
| 129 | + if binary_path is None: |
| 130 | + binary_path, base_address = find_loaded_module(pid, name_contains="python3") |
| 131 | + # Step 3: Parse PE section headers to get .PyRuntime RVA |
| 132 | + section_rva = parse_pe_section_offset(binary_path, ".PyRuntime") |
| 133 | + # Step 4: Compute PyRuntime address in memory |
| 134 | + return base_address + section_rva |
| 135 | +
|
| 136 | +Reading _Py_DebugOffsets |
| 137 | +========================= |
| 138 | + |
| 139 | +Once the address of the ``PyRuntime`` structure has been computed in the target |
| 140 | +process, the next step is to read the ``_Py_DebugOffsets`` structure located at |
| 141 | +its beginning. |
| 142 | + |
| 143 | +This structure contains version-specific field offsets needed to navigate |
| 144 | +interpreter and thread state memory safely. |
| 145 | + |
| 146 | +To read and validate the debug offsets: |
| 147 | + |
| 148 | +1. Read the memory at the address of ``PyRuntime``, up to the size of |
| 149 | + ``_Py_DebugOffsets``. This structure is located at the very start of the |
| 150 | + ``PyRuntime`` block. |
| 151 | + |
| 152 | +2. Verify that the contents of the structure are valid. In particular: |
| 153 | + |
| 154 | + - The ``cookie`` field must match the expected debug marker. |
| 155 | + - The ``version`` field must match the version of the Python interpreter |
| 156 | + used by the calling process (i.e., the debugger or controlling runtime). |
| 157 | + - If either the caller or the target process is running a pre-release version |
| 158 | + (such as an alpha, beta, or release candidate), then the versions must match |
| 159 | + exactly. |
| 160 | + - The ``free_threaded`` flag must match between the caller and the target process. |
| 161 | + |
| 162 | +3. If the structure passes validation, the debugger may now safely use the |
| 163 | + provided offsets to locate fields in interpreter and thread state structures. |
| 164 | + |
| 165 | +If any validation step fails, the debugger should abort rather than attempting to |
| 166 | +access incompatible memory layouts. |
| 167 | + |
| 168 | +An example of how a debugger might read and validate ``_Py_DebugOffsets``: |
| 169 | + |
| 170 | +.. code-block:: python |
| 171 | +
|
| 172 | + def read_debug_offsets(pid, py_runtime_addr): |
| 173 | + # Step 1: Read memory from the target process at the PyRuntime address |
| 174 | + data = read_process_memory(pid, address=py_runtime_addr, size=DEBUG_OFFSETS_SIZE) |
| 175 | + # Step 2: Deserialize the raw bytes into a _Py_DebugOffsets structure |
| 176 | + debug_offsets = parse_debug_offsets(data) |
| 177 | + # Step 3: Validate compatibility |
| 178 | + if debug_offsets.cookie != EXPECTED_COOKIE: |
| 179 | + raise RuntimeError("Invalid or missing debug cookie") |
| 180 | + if debug_offsets.version != LOCAL_PYTHON_VERSION: |
| 181 | + raise RuntimeError("Mismatch between caller and target Python versions") |
| 182 | + if debug_offsets.free_threaded != LOCAL_FREE_THREADED: |
| 183 | + raise RuntimeError("Mismatch in free-threaded configuration") |
| 184 | + return debug_offsets |
| 185 | +
|
| 186 | +Locating the Interpreter and Thread State |
| 187 | +========================================= |
| 188 | + |
| 189 | +After validating the ``_Py_DebugOffsets`` structure, the next step is to locate the |
| 190 | +interpreter and thread state objects within the target process. These structures |
| 191 | +hold essential runtime context and are required for writing debugger control |
| 192 | +information. |
| 193 | + |
| 194 | +- The ``PyInterpreterState`` structure represents a Python interpreter instance. |
| 195 | + Each interpreter holds its own module imports, built-in state, and thread list. |
| 196 | + Most applications use only one interpreter, but CPython supports creating multiple |
| 197 | + interpreters in the same process. |
| 198 | + |
| 199 | +- The ``PyThreadState`` structure represents a thread running within an interpreter. |
| 200 | + This is where evaluation state and the control fields used by the debugger live. |
| 201 | + |
| 202 | +To inject and run code remotely, the debugger must locate a valid ``PyThreadState`` |
| 203 | +to target. Typically, this is the main thread, but in some cases, the debugger may |
| 204 | +want to attach to a specific thread by its native thread ID. |
| 205 | + |
| 206 | +To locate a thread: |
| 207 | + |
| 208 | +1. Use the offset ``runtime_state.interpreters_head`` to find the address of the |
| 209 | + first interpreter in the ``PyRuntime`` structure. This is the entry point to |
| 210 | + the list of active interpreters. |
| 211 | + |
| 212 | +2. Use the offset ``interpreter_state.threads_main`` to locate the main thread |
| 213 | + of that interpreter. This is the simplest and most reliable thread to target. |
| 214 | + |
| 215 | +3. Optionally, use ``interpreter_state.threads_head`` to walk the linked list of |
| 216 | + all threads. For each ``PyThreadState``, compare the ``native_thread_id`` |
| 217 | + field (using ``thread_state.native_thread_id``) to find a specific thread. |
| 218 | + |
| 219 | + This is useful when the debugger allows the user to select which thread to inject into, |
| 220 | + or when targeting a thread that's actively running. |
| 221 | + |
| 222 | +4. Once a valid ``PyThreadState`` is found, record its address. This will be used |
| 223 | + in the next step to write debugger control fields and schedule execution. |
| 224 | + |
| 225 | +An example of locating the main thread: |
| 226 | + |
| 227 | +.. code-block:: python |
| 228 | +
|
| 229 | + def find_main_thread_state(pid, py_runtime_addr, debug_offsets): |
| 230 | + # Step 1: Read interpreters_head from PyRuntime |
| 231 | + interp_head_ptr = py_runtime_addr + debug_offsets.runtime_state.interpreters_head |
| 232 | + interp_addr = read_pointer(pid, interp_head_ptr) |
| 233 | + if interp_addr == 0: |
| 234 | + raise RuntimeError("No interpreter found in the target process") |
| 235 | + # Step 2: Read the threads_main pointer from the interpreter |
| 236 | + threads_main_ptr = interp_addr + debug_offsets.interpreter_state.threads_main |
| 237 | + thread_state_addr = read_pointer(pid, threads_main_ptr) |
| 238 | + if thread_state_addr == 0: |
| 239 | + raise RuntimeError("Main thread state is not available") |
| 240 | + return thread_state_addr |
| 241 | +
|
| 242 | +To locate a specific thread by native thread ID: |
| 243 | + |
| 244 | +.. code-block:: python |
| 245 | +
|
| 246 | + def find_thread_by_id(pid, interp_addr, debug_offsets, target_tid): |
| 247 | + # Start at threads_head and walk the linked list |
| 248 | + thread_ptr = read_pointer( |
| 249 | + pid, interp_addr + debug_offsets.interpreter_state.threads_head |
| 250 | + ) |
| 251 | + while thread_ptr: |
| 252 | + native_tid_ptr = thread_ptr + debug_offsets.thread_state.native_thread_id |
| 253 | + native_tid = read_int(pid, native_tid_ptr) |
| 254 | + if native_tid == target_tid: |
| 255 | + return thread_ptr |
| 256 | + thread_ptr = read_pointer(pid, thread_ptr + debug_offsets.thread_state.next) |
| 257 | + raise RuntimeError("Thread with the given ID was not found") |
| 258 | +
|
| 259 | +Once a valid thread state has been identified, the debugger can use it to modify |
| 260 | +control fields and request execution in the next stage of the protocol. |
| 261 | + |
| 262 | +Writing Control Information |
| 263 | +=========================== |
| 264 | + |
| 265 | +Once a valid thread state has been located, the debugger can write control fields |
| 266 | +that instruct the target process to execute a script at the next safe opportunity. |
| 267 | + |
| 268 | +Each thread state contains a ``_PyRemoteDebuggerSupport`` structure, which is used |
| 269 | +to coordinate communication between the debugger and the interpreter. The debugger |
| 270 | +uses offsets from ``_Py_DebugOffsets`` to locate three key fields: |
| 271 | + |
| 272 | +- ``debugger_script_path``: A buffer where the debugger writes the full path to |
| 273 | + a Python source file (``.py``). The file must exist and be readable by the |
| 274 | + target process. |
| 275 | + |
| 276 | +- ``debugger_pending_call``: An integer flag. When set to ``1``, it signals |
| 277 | + that a script is ready to be executed. |
| 278 | + |
| 279 | +- ``eval_breaker``: A field checked periodically by the evaluation loop. To |
| 280 | + notify the interpreter of pending debugger activity, the debugger sets the |
| 281 | + ``_PY_EVAL_PLEASE_STOP_BIT`` in this field. This causes the interpreter to pause |
| 282 | + and check for debugger-related actions before continuing with normal execution. |
| 283 | + |
| 284 | +To safely modify these fields, most debuggers should suspend the process before |
| 285 | +writing to memory. This avoids race conditions that may occur if the interpreter |
| 286 | +is actively running. |
| 287 | + |
| 288 | +To perform the injection: |
| 289 | + |
| 290 | +1. Write the script path into the ``debugger_script_path`` buffer. |
| 291 | +2. Set the ``debugger_pending_call`` flag to ``1``. |
| 292 | +3. Read the value of ``eval_breaker``, set the stop bit, and write the updated |
| 293 | + value back. |
| 294 | + |
| 295 | +An example implementation might look like: |
| 296 | + |
| 297 | +.. code-block:: python |
| 298 | +
|
| 299 | + def inject_script(pid, thread_state_addr, debug_offsets, script_path): |
| 300 | + # Base offset to the _PyRemoteDebuggerSupport struct |
| 301 | + support_base = ( |
| 302 | + thread_state_addr + |
| 303 | + debug_offsets.debugger_support.remote_debugger_support |
| 304 | + ) |
| 305 | + # 1. Write script path |
| 306 | + script_path_ptr = support_base + debug_offsets.debugger_support.debugger_script_path |
| 307 | + write_string(pid, script_path_ptr, script_path) |
| 308 | + # 2. Set debugger_pending_call = 1 |
| 309 | + pending_ptr = support_base + debug_offsets.debugger_support.debugger_pending_call |
| 310 | + write_int(pid, pending_ptr, 1) |
| 311 | + # 3. Set _PY_EVAL_PLEASE_STOP_BIT in eval_breaker |
| 312 | + eval_breaker_ptr = thread_state_addr + debug_offsets.debugger_support.eval_breaker |
| 313 | + breaker = read_int(pid, eval_breaker_ptr) |
| 314 | + # Set the least significant bit (this is _PY_EVAL_PLEASE_STOP_BIT) |
| 315 | + breaker |= 1 |
| 316 | + write_int(pid, eval_breaker_ptr, breaker) |
| 317 | +
|
| 318 | +After these writes are complete, the debugger may resume the process (if it was paused). |
| 319 | +The interpreter will check ``eval_breaker`` at the next evaluation checkpoint, |
| 320 | +detect the pending call, and load and execute the specified Python file. The debugger is responsible |
| 321 | +for ensuring that the file remains on disk and readable by the target interpreter |
| 322 | +when it is accessed. |
| 323 | + |
| 324 | +Summary |
| 325 | +======= |
| 326 | + |
| 327 | +To inject and execute a script in a remote Python process: |
| 328 | + |
| 329 | +1. Locate the ``PyRuntime`` structure in the target process's memory. |
| 330 | +2. Read and validate the ``_Py_DebugOffsets`` structure at the start of ``PyRuntime``. |
| 331 | +3. Use the offsets to locate a valid ``PyThreadState``. |
| 332 | +4. Write the path to a Python script into ``debugger_script_path``. |
| 333 | +5. Set ``debugger_pending_call = 1``. |
| 334 | +6. Set ``_PY_EVAL_PLEASE_STOP_BIT`` in ``eval_breaker``. |
| 335 | +7. Resume the process (if paused). The script will be executed at the next safe eval point. |
0 commit comments