📝 cTP (coroTracer Protocol) Memory Layout and Concurrency Synchronization Specification

Version: 1.0 Status: Production-Ready Core Features: Cross-Language, Zero-Copy, Lock-Free, Cache-Line Friendly

1. Overview

cTP (coroTracer Protocol) is not a traditional TCP/UDP-based network communication protocol, but rather a physical memory mapping (mmap) contract strictly based on byte alignment.

Due to the extreme performance demands of modern M:N coroutine schedulers, traditional RPC or Socket log collection solutions introduce intolerable serialization and context switching overhead. The cTP protocol, by strictly dictating the binary layout and atomic barrier (Memory Barriers) rules of the shared memory (/tmp/corotracer.shm), enables the tested target program (C++, Rust, Zig, etc.) to record timing at speeds approaching the L1 Cache, while the Go engine harvests non-blockingly in a completely independent process.

2. Macro Topology

The entire shared memory file is strictly divided into fixed-size memory blocks. The first 1KB is dedicated to global state negotiation, followed by N consecutive 1KB coroutine observation stations (Station).

[ Shared Memory File: corotracer.shm ]
=======================================================================
| Offset (Hex) | Size (Bytes) | Block Name                            |
=======================================================================
| 0x00000000   | 1024 (1KB)   | GlobalHeader                          |
| 0x00000400   | 1024 (1KB)   | StationData #0                        |
| 0x00000800   | 1024 (1KB)   | StationData #1                        |
| ...          | ...          | ...                                   |
| Header + N*1K| 1024 (1KB)   | StationData #N                        |
=======================================================================

Mandatory Constraint: When implementing this protocol in any language, the total size of the structure must be strictly guaranteed to be exactly 1024 bytes, completely rejecting the compiler's implicit Padding, to ensure absolute cross-language ABI consistency.

3. Micro Layout

3.1 GlobalHeader (Global Negotiation Header)

Alignment Requirement: 1024 Bytes ( alignas(1024) ) Responsibility: Stores cross-process handshake information and the global cursor for the lock-free allocator.

Offset	Field	Type	Bytes	Description
`0x00`	`magic_number`	`uint64`	8	Magic number, fixed at `0x434F524F54524352` (ASCII: COROTRCR)
`0x08`	`version`	`uint32`	4	Protocol version number, currently `1`
`0x0C`	`max_stations`	`uint32`	4	Maximum total number of Stations pre-allocated in the SHM file
`0x10`	`allocated_count`	`atomic<uint32>`	4	[Lock-Free Allocator Cursor] The target program obtains an available Station via atomic increment
`0x14`	`tracer_sleeping`	`atomic<uint32>`	4	Engine sleep flag: `0` = Active, `1` = Sleeping awaiting wakeup
`0x18`	`_reserved`	`char[1000]`	1000	Hard Padding Zone: Pad to a full 1024 bytes

3.2 Epoch (Core Event Slot)

Alignment Requirement: 64 Bytes ( alignas(64) ) Responsibility: Records a snapshot of a single coroutine state transition. Design Philosophy: 64 bytes perfectly matches the Cache Line size of modern CPUs. When multiple threads concurrently write to different Epochs, they are physically isolated in different cache lines, completely eliminating the drastic performance drops caused by False Sharing.

Offset	Field	Type	Bytes	Description
`0x00`	`timestamp`	`uint64`	8	Nanosecond-level timestamp (e.g., `clock_gettime(CLOCK_MONOTONIC)`)
`0x08`	`tid`	`uint64`	8	Real OS thread ID (not high-level language level ID)
`0x10`	`addr`	`uint64`	8	Instruction address or coroutine heap frame pointer upon suspension/resumption
`0x18`	`seq`	`atomic<uint64>`	8	[Core Concurrency Barrier] Monotonically increasing sequence number. Used for read/write barriers
`0x20`	`reserved`	`char[31]`	31	Reserved space (can be used to store a small amount of business Payload)
`0x3F`	`is_active`	`bool (uint8)`	1	State machine flag: `1` = Active (Running), `0` = Suspend (Suspended)

3.3 StationData (Coroutine Station)

Alignment Requirement: 1024 Bytes ( alignas(1024) ) Responsibility: Each coroutine instance exclusively occupies one Station throughout its entire lifecycle.

Offset	Zone	Bytes	Description
`0x000`	`Header.probe_id`	8	Probe globally unique ID (recommended to use the memory address at coroutine creation)
`0x008`	`Header.birth_ts`	8	Nanosecond timestamp of coroutine birth
`0x010`	`Header.is_dead`	1	Whether the coroutine has finished destruction (`1` = Dead)
`0x011`	`Header._pad`	47	Pad to 64-byte alignment
`0x040`	`Slots[8]`	512	Event Polling Buffer (RingBuffer): 8 Epochs, totaling 512 Bytes
`0x240`	`Flexible`	448	Hard Padding Zone: Pad to a full 1024 bytes

4. Concurrency Synchronization and Read/Write Contract

cTP completely abandons Mutex and SpinLock, relying solely on hardware-level memory barriers. Implementing this protocol must comply with the following read/write contract:

4.1 Probe Write Side (Target App / SDK)

O(1) Lock-Free Allocation: When a new coroutine is born, execute index = fetch_add(&GlobalHeader.allocated_count, 1, std::memory_order_relaxed). If index < max_stations, exclusively occupy StationData[index].
Circular Write (Ring Buffer): Upon context switch, obtain the auto-incremented sequence number seq. Locate the slot: slot = Station.Slots[seq % 8].
Memory Barrier [Fatal Constraint]: The probe must first write ordinary data such as timestamp, tid, addr, is_active. As the final step, it must update seq using Release semantics:
```
slot.seq.store(current_seq, std::memory_order_release);
```
This ensures that when the Go engine sees seq updated, all preceding data has been flushed to physical memory, absolutely preventing dirty reads.

4.2 Engine Harvest Side (Go Tracer Engine)

Local Snapshot: The Go engine maintains a last_seen_seqs[MAX_STATIONS][8] array locally.

Safe Read (Acquire): When polling seq, atomic loading must be used:

currentSeq := atomic.LoadUint64(&slot.Seq) // Inherently carries an Acquire barrier by default

Data Extraction: If currentSeq > last_seen_seqs, extract the data of the current slot, and upon completion, update the local last_seen_seqs.

4.3 Smart Wakeup Contract (UDS Wakeup)

To prevent the Go engine from spinning the CPU idly (Busy Wait) during business troughs, a UDS wakeup mechanism is introduced:

After N consecutive harvests with no data, the Go engine sets GlobalHeader.tracer_sleeping to 1, and subsequently blocks reading the UDS (Unix Domain Socket).
After writing data, if the C++ probe detects tracer_sleeping == 1, it sends a single-byte signal '1' to the UDS (using non-blocking O_NONBLOCK write; failures are directly ignored, absolutely never blocking the target program).
Upon receiving the signal, the Go engine is instantly awakened by the kernel, resets tracer_sleeping to 0, and enters the next round of frantic harvesting.

5. Cross-Language Implementation Reference (FFI Guide)

Note: The repository now ships a framework-free Rust poll-model SDK under SDK/rust, aiming to keep the integration as close as possible to the small change surface of the C++ SDK. Other languages are still pending (e.g., Zig is currently unstable).

Rust Language Implementation Mapping Reference (Pseudocode)

In Rust, #[repr(C)] and #[repr(align(X))] must be strictly used.

use std::sync::atomic::{AtomicU64, AtomicU32};

#[repr(C, align(64))]
pub struct Epoch {
    pub timestamp: u64,
    pub tid: u64,
    pub addr: u64,
    pub seq: AtomicU64,
    pub reserved: [u8; 31],
    pub is_active: bool,
}

#[repr(C, align(1024))]
pub struct StationData {
    pub probe_id: u64,
    pub birth_ts: u64,
    pub is_dead: bool,
    pub _pad: [u8; 47],
    pub slots: [Epoch; 8],
    pub flexible: [u8; 448],
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📝 cTP (coroTracer Protocol) Memory Layout and Concurrency Synchronization Specification

1. Overview

2. Macro Topology

3. Micro Layout

3.1 GlobalHeader (Global Negotiation Header)

3.2 Epoch (Core Event Slot)

3.3 StationData (Coroutine Station)

4. Concurrency Synchronization and Read/Write Contract

4.1 Probe Write Side (Target App / SDK)

4.2 Engine Harvest Side (Go Tracer Engine)

4.3 Smart Wakeup Contract (UDS Wakeup)

5. Cross-Language Implementation Reference (FFI Guide)

Rust Language Implementation Mapping Reference (Pseudocode)

FilesExpand file tree

cTP.md

Latest commit

History

cTP.md

File metadata and controls

📝 cTP (coroTracer Protocol) Memory Layout and Concurrency Synchronization Specification

1. Overview

2. Macro Topology

3. Micro Layout

3.1 GlobalHeader (Global Negotiation Header)

3.2 Epoch (Core Event Slot)

3.3 StationData (Coroutine Station)

4. Concurrency Synchronization and Read/Write Contract

4.1 Probe Write Side (Target App / SDK)

4.2 Engine Harvest Side (Go Tracer Engine)

4.3 Smart Wakeup Contract (UDS Wakeup)

5. Cross-Language Implementation Reference (FFI Guide)

Rust Language Implementation Mapping Reference (Pseudocode)