| 
 | 1 | +# Crash Tracker Unix Socket Communication Protocol  | 
 | 2 | + | 
 | 3 | +**Date**: September 23, 2025  | 
 | 4 | + | 
 | 5 | +## Overview  | 
 | 6 | + | 
 | 7 | +This document describes the Unix domain socket communication protocol used between the crash tracker's collector and receiver processes. The crash tracker uses a two-process architecture where the collector (a fork of the crashing process) communicates crash data to the receiver (a fork+execve process) via an anonymous Unix domain socket pair.  | 
 | 8 | + | 
 | 9 | +## Socket Creation and Setup  | 
 | 10 | + | 
 | 11 | +The communication channel is established using `socketpair()` to create an anonymous Unix domain socket pair:  | 
 | 12 | + | 
 | 13 | +```rust  | 
 | 14 | +let (uds_parent, uds_child) = socket::socketpair(  | 
 | 15 | +    socket::AddressFamily::Unix,  | 
 | 16 | +    socket::SockType::Stream,  | 
 | 17 | +    None,  | 
 | 18 | +    socket::SockFlag::empty(),  | 
 | 19 | +)?;  | 
 | 20 | +```  | 
 | 21 | + | 
 | 22 | +**Location**: `datadog-crashtracker/src/collector/receiver_manager.rs:78-85`  | 
 | 23 | + | 
 | 24 | +### File Descriptor Management  | 
 | 25 | + | 
 | 26 | +1. **Parent Process**: Retains `uds_parent` for tracking  | 
 | 27 | +2. **Collector Process**: Inherits `uds_parent` as the write end  | 
 | 28 | +3. **Receiver Process**: Gets `uds_child` redirected to stdin via `dup2(uds_child, 0)`  | 
 | 29 | + | 
 | 30 | +## Communication Protocol  | 
 | 31 | + | 
 | 32 | +### Data Format  | 
 | 33 | + | 
 | 34 | +The crash data is transmitted as a structured text stream with distinct sections delimited by markers defined in `datadog-crashtracker/src/shared/constants.rs`.  | 
 | 35 | + | 
 | 36 | +### Message Structure  | 
 | 37 | + | 
 | 38 | +Each crash report follows this sequence:  | 
 | 39 | + | 
 | 40 | +1. **Metadata Section**  | 
 | 41 | +2. **Configuration Section**  | 
 | 42 | +3. **Signal Information Section**  | 
 | 43 | +4. **Process Context Section**  | 
 | 44 | +5. **Process Information Section**  | 
 | 45 | +6. **Counters Section**  | 
 | 46 | +7. **Spans Section**  | 
 | 47 | +8. **Additional Tags Section**  | 
 | 48 | +9. **Traces Section**  | 
 | 49 | +10. **Memory Maps Section** (Linux only)  | 
 | 50 | +11. **Stack Trace Section**  | 
 | 51 | +12. **Completion Marker**  | 
 | 52 | + | 
 | 53 | +### Section Details  | 
 | 54 | + | 
 | 55 | +#### 1. Metadata Section  | 
 | 56 | +```  | 
 | 57 | +DD_CRASHTRACK_BEGIN_METADATA  | 
 | 58 | +{JSON metadata object}  | 
 | 59 | +DD_CRASHTRACK_END_METADATA  | 
 | 60 | +```  | 
 | 61 | + | 
 | 62 | +Contains serialized `Metadata` object with application context, tags, and environment information.  | 
 | 63 | + | 
 | 64 | +#### 2. Configuration Section  | 
 | 65 | +```  | 
 | 66 | +DD_CRASHTRACK_BEGIN_CONFIG  | 
 | 67 | +{JSON configuration object}  | 
 | 68 | +DD_CRASHTRACK_END_CONFIG  | 
 | 69 | +```  | 
 | 70 | + | 
 | 71 | +Contains serialized `CrashtrackerConfiguration` with crash tracking settings, endpoint information, and processing options.  | 
 | 72 | + | 
 | 73 | +#### 3. Signal Information Section  | 
 | 74 | +```  | 
 | 75 | +DD_CRASHTRACK_BEGIN_SIGINFO  | 
 | 76 | +{  | 
 | 77 | +  "si_code": <signal_code>,  | 
 | 78 | +  "si_code_human_readable": "<description>",  | 
 | 79 | +  "si_signo": <signal_number>,  | 
 | 80 | +  "si_signo_human_readable": "<signal_name>",  | 
 | 81 | +  "si_addr": "<fault_address>" // Optional, for memory faults  | 
 | 82 | +}  | 
 | 83 | +DD_CRASHTRACK_END_SIGINFO  | 
 | 84 | +```  | 
 | 85 | + | 
 | 86 | +Contains signal details extracted from `siginfo_t` structure.  | 
 | 87 | + | 
 | 88 | +**Implementation**: `datadog-crashtracker/src/collector/emitters.rs:223-263`  | 
 | 89 | + | 
 | 90 | +#### 4. Process Context Section (ucontext)  | 
 | 91 | +```  | 
 | 92 | +DD_CRASHTRACK_BEGIN_UCONTEXT  | 
 | 93 | +<platform-specific context dump>  | 
 | 94 | +DD_CRASHTRACK_END_UCONTEXT  | 
 | 95 | +```  | 
 | 96 | + | 
 | 97 | +Contains processor state at crash time from `ucontext_t`. Format varies by platform:  | 
 | 98 | +- **Linux**: Direct debug print of `ucontext_t`  | 
 | 99 | +- **macOS**: Includes both `ucontext_t` and machine context (`mcontext`)  | 
 | 100 | + | 
 | 101 | +**Implementation**: `datadog-crashtracker/src/collector/emitters.rs:190-221`  | 
 | 102 | + | 
 | 103 | +#### 5. Process Information Section  | 
 | 104 | +```  | 
 | 105 | +DD_CRASHTRACK_BEGIN_PROCINFO  | 
 | 106 | +{"pid": <process_id>}  | 
 | 107 | +DD_CRASHTRACK_END_PROCINFO  | 
 | 108 | +```  | 
 | 109 | + | 
 | 110 | +Contains the process ID of the crashing process.  | 
 | 111 | + | 
 | 112 | +#### 6. Counters Section  | 
 | 113 | +```  | 
 | 114 | +DD_CRASHTRACK_BEGIN_COUNTERS  | 
 | 115 | +<counter data>  | 
 | 116 | +DD_CRASHTRACK_END_COUNTERS  | 
 | 117 | +```  | 
 | 118 | + | 
 | 119 | +Contains internal crash tracker counters and metrics.  | 
 | 120 | + | 
 | 121 | +#### 7. Spans Section  | 
 | 122 | +```  | 
 | 123 | +DD_CRASHTRACK_BEGIN_SPANS  | 
 | 124 | +<span data>  | 
 | 125 | +DD_CRASHTRACK_END_SPANS  | 
 | 126 | +```  | 
 | 127 | + | 
 | 128 | +Contains active distributed tracing spans at crash time.  | 
 | 129 | + | 
 | 130 | +#### 8. Additional Tags Section  | 
 | 131 | +```  | 
 | 132 | +DD_CRASHTRACK_BEGIN_TAGS  | 
 | 133 | +<tag data>  | 
 | 134 | +DD_CRASHTRACK_END_TAGS  | 
 | 135 | +```  | 
 | 136 | + | 
 | 137 | +Contains additional tags collected at crash time.  | 
 | 138 | + | 
 | 139 | +#### 9. Traces Section  | 
 | 140 | +```  | 
 | 141 | +DD_CRASHTRACK_BEGIN_TRACES  | 
 | 142 | +<trace data>  | 
 | 143 | +DD_CRASHTRACK_END_TRACES  | 
 | 144 | +```  | 
 | 145 | + | 
 | 146 | +Contains active trace information.  | 
 | 147 | + | 
 | 148 | +#### 10. Memory Maps Section (Linux Only)  | 
 | 149 | +```  | 
 | 150 | +DD_CRASHTRACK_BEGIN_FILE /proc/self/maps  | 
 | 151 | +<contents of /proc/self/maps>  | 
 | 152 | +DD_CRASHTRACK_END_FILE "/proc/self/maps"  | 
 | 153 | +```  | 
 | 154 | + | 
 | 155 | +Contains memory mapping information from `/proc/self/maps` for symbol resolution.  | 
 | 156 | + | 
 | 157 | +**Implementation**: `datadog-crashtracker/src/collector/emitters.rs:184-187`  | 
 | 158 | + | 
 | 159 | +#### 11. Stack Trace Section  | 
 | 160 | +```  | 
 | 161 | +DD_CRASHTRACK_BEGIN_STACKTRACE  | 
 | 162 | +{"ip": "<instruction_pointer>", "module_base_address": "<base>", "sp": "<stack_pointer>", "symbol_address": "<addr>"}  | 
 | 163 | +{"ip": "<instruction_pointer>", "module_base_address": "<base>", "sp": "<stack_pointer>", "symbol_address": "<addr>", "function": "<name>", "file": "<path>", "line": <number>}  | 
 | 164 | +...  | 
 | 165 | +DD_CRASHTRACK_END_STACKTRACE  | 
 | 166 | +```  | 
 | 167 | + | 
 | 168 | +Each line represents one stack frame. Frame format depends on symbol resolution setting:  | 
 | 169 | + | 
 | 170 | +- **Disabled/Receiver-only**: Only addresses (`ip`, `sp`, `symbol_address`, optional `module_base_address`)  | 
 | 171 | +- **In-process symbols**: Includes debug information (`function`, `file`, `line`, `column`)  | 
 | 172 | + | 
 | 173 | +Stack frames with stack pointer less than the fault stack pointer are filtered out to exclude crash tracker frames.  | 
 | 174 | + | 
 | 175 | +**Implementation**: `datadog-crashtracker/src/collector/emitters.rs:45-117`  | 
 | 176 | + | 
 | 177 | +#### 12. Completion Marker  | 
 | 178 | +```  | 
 | 179 | +DD_CRASHTRACK_DONE  | 
 | 180 | +```  | 
 | 181 | + | 
 | 182 | +Indicates end of crash report transmission.  | 
 | 183 | + | 
 | 184 | +## Communication Flow  | 
 | 185 | + | 
 | 186 | +### 1. Collector Side (Write End)  | 
 | 187 | + | 
 | 188 | +**File**: `datadog-crashtracker/src/collector/collector_manager.rs:92-102`  | 
 | 189 | + | 
 | 190 | +```rust  | 
 | 191 | +let mut unix_stream = unsafe { UnixStream::from_raw_fd(uds_fd) };  | 
 | 192 | + | 
 | 193 | +let report = emit_crashreport(  | 
 | 194 | +    &mut unix_stream,  | 
 | 195 | +    config,  | 
 | 196 | +    config_str,  | 
 | 197 | +    metadata_str,  | 
 | 198 | +    sig_info,  | 
 | 199 | +    ucontext,  | 
 | 200 | +    ppid,  | 
 | 201 | +);  | 
 | 202 | +```  | 
 | 203 | + | 
 | 204 | +The collector:  | 
 | 205 | +1. Creates `UnixStream` from inherited file descriptor  | 
 | 206 | +2. Calls `emit_crashreport()` to serialize and write all crash data  | 
 | 207 | +3. Flushes the stream after each section for reliability  | 
 | 208 | +4. Exits with `libc::_exit(0)` on completion  | 
 | 209 | + | 
 | 210 | +### 2. Receiver Side (Read End)  | 
 | 211 | + | 
 | 212 | +**File**: `datadog-crashtracker/src/receiver/entry_points.rs:97-119`  | 
 | 213 | + | 
 | 214 | +```rust  | 
 | 215 | +pub(crate) async fn receiver_entry_point(  | 
 | 216 | +    timeout: Duration,  | 
 | 217 | +    stream: impl AsyncBufReadExt + std::marker::Unpin,  | 
 | 218 | +) -> anyhow::Result<()> {  | 
 | 219 | +    if let Some((config, mut crash_info)) = receive_report_from_stream(timeout, stream).await? {  | 
 | 220 | +        // Process crash data  | 
 | 221 | +        if let Err(e) = resolve_frames(&config, &mut crash_info) {  | 
 | 222 | +            crash_info.log_messages.push(format!("Error resolving frames: {e}"));  | 
 | 223 | +        }  | 
 | 224 | +        if config.demangle_names() {  | 
 | 225 | +            if let Err(e) = crash_info.demangle_names() {  | 
 | 226 | +                crash_info.log_messages.push(format!("Error demangling names: {e}"));  | 
 | 227 | +            }  | 
 | 228 | +        }  | 
 | 229 | +        crash_info.async_upload_to_endpoint(config.endpoint()).await?;  | 
 | 230 | +    }  | 
 | 231 | +    Ok(())  | 
 | 232 | +}  | 
 | 233 | +```  | 
 | 234 | + | 
 | 235 | +The receiver:  | 
 | 236 | +1. Reads from stdin (Unix socket via `dup2`)  | 
 | 237 | +2. Parses the structured stream into `CrashInfo` and `CrashtrackerConfiguration`  | 
 | 238 | +3. Performs symbol resolution if configured  | 
 | 239 | +4. Uploads formatted crash report to backend  | 
 | 240 | + | 
 | 241 | +### 3. Stream Parsing  | 
 | 242 | + | 
 | 243 | +**File**: `datadog-crashtracker/src/receiver/receive_report.rs`  | 
 | 244 | + | 
 | 245 | +The receiver parses the stream by:  | 
 | 246 | +1. Reading line-by-line with timeout protection  | 
 | 247 | +2. Matching delimiter patterns to identify sections  | 
 | 248 | +3. Accumulating section data between delimiters  | 
 | 249 | +4. Deserializing JSON sections into appropriate data structures  | 
 | 250 | +5. Handling the `DD_CRASHTRACK_DONE` completion marker  | 
 | 251 | + | 
 | 252 | +## Error Handling and Reliability  | 
 | 253 | + | 
 | 254 | +### Signal Safety  | 
 | 255 | +- All collector operations use only async-signal-safe functions  | 
 | 256 | +- No memory allocation in signal handler context  | 
 | 257 | +- Pre-prepared data structures (`PreparedExecve`) to avoid allocations  | 
 | 258 | + | 
 | 259 | +### Timeout Protection  | 
 | 260 | +- Receiver has configurable timeout (default: 4000ms)  | 
 | 261 | +- Environment variable: `DD_CRASHTRACKER_RECEIVER_TIMEOUT_MS`  | 
 | 262 | +- Prevents hanging on incomplete/corrupted streams  | 
 | 263 | + | 
 | 264 | +### Process Cleanup  | 
 | 265 | +- Parent process uses `wait_for_pollhup()` to detect socket closure  | 
 | 266 | +- Kills child processes with `SIGKILL` if needed  | 
 | 267 | +- Reaps zombie processes to prevent resource leaks  | 
 | 268 | + | 
 | 269 | +**File**: `datadog-crashtracker/src/collector/process_handle.rs:19-40`  | 
 | 270 | + | 
 | 271 | +### Data Integrity  | 
 | 272 | +- Each section is flushed immediately after writing  | 
 | 273 | +- Structured delimiters allow detection of incomplete transmissions  | 
 | 274 | +- Error messages are accumulated rather than failing fast  | 
 | 275 | + | 
 | 276 | +## Alternative Communication Modes  | 
 | 277 | + | 
 | 278 | +### Named Socket Mode  | 
 | 279 | +When `unix_socket_path` is configured, the collector connects to an existing Unix socket instead of using the fork+execve receiver:  | 
 | 280 | + | 
 | 281 | +```rust  | 
 | 282 | +let receiver = if unix_socket_path.is_empty() {  | 
 | 283 | +    Receiver::spawn_from_stored_config()?  // Fork+execve mode  | 
 | 284 | +} else {  | 
 | 285 | +    Receiver::from_socket(unix_socket_path)?  // Named socket mode  | 
 | 286 | +};  | 
 | 287 | +```  | 
 | 288 | + | 
 | 289 | +This allows integration with long-lived receiver processes.  | 
 | 290 | + | 
 | 291 | +**Linux Abstract Sockets**: On Linux, socket paths not starting with `.` or `/` are treated as abstract socket names.  | 
 | 292 | + | 
 | 293 | +## Security Considerations  | 
 | 294 | + | 
 | 295 | +### File Descriptor Isolation  | 
 | 296 | +- Collector closes stdio file descriptors (0, 1, 2)  | 
 | 297 | +- Receiver redirects socket to stdin, stdout/stderr to configured files  | 
 | 298 | +- Minimizes attack surface during crash processing  | 
 | 299 | + | 
 | 300 | +### Process Isolation  | 
 | 301 | +- Fork+execve provides strong process boundary  | 
 | 302 | +- Crash in collector doesn't affect receiver  | 
 | 303 | +- Signal handlers are reset in receiver child  | 
 | 304 | + | 
 | 305 | +### Resource Limits  | 
 | 306 | +- Timeout prevents resource exhaustion  | 
 | 307 | +- Fixed buffer sizes for file operations  | 
 | 308 | +- Immediate flushing prevents large memory usage  | 
 | 309 | + | 
 | 310 | +## Debugging and Monitoring  | 
 | 311 | + | 
 | 312 | +### Log Output  | 
 | 313 | +- Receiver can be configured with `stdout_filename` and `stderr_filename`  | 
 | 314 | +- Error messages are accumulated in crash report  | 
 | 315 | +- Debug assertions validate critical operations  | 
 | 316 | + | 
 | 317 | +### Environment Variables  | 
 | 318 | +- `DD_CRASHTRACKER_RECEIVER_TIMEOUT_MS`: Receiver timeout  | 
 | 319 | +- Standard Unix environment passed through execve  | 
 | 320 | + | 
 | 321 | +This communication protocol ensures reliable crash data collection and transmission even when the main process is in an unstable state, providing robust crash reporting capabilities for production systems.  | 
0 commit comments