Async and base infrastructure for cross-process shared memory.#23688
Merged
Async and base infrastructure for cross-process shared memory.#23688
Conversation
Add iree_shm_* API for cross-platform shared memory create/open/close with handle passing support. This is the foundation for the SHM carrier stack (SPSC ring, cross-process notification, SHM carrier). Platform implementations: - Linux: memfd_create + mmap, with F_ADD_SEALS to prevent peer truncation - macOS: shm_open + immediate shm_unlink + mmap, with EEXIST retry loop - Windows: CreateFileMappingW + MapViewOfFile, Local\ namespace for named Includes fstat size validation on POSIX open paths to fail early on size mismatches (matching Windows MapViewOfFile behavior, avoiding SIGBUS). Co-Authored-By: Claude <noreply@anthropic.com>
Adds a lock-free single-producer single-consumer queue operating on caller-provided memory, designed for cross-process shared memory regions. The queue uses monotonically increasing 64-bit positions with acquire-release ordering and cache-line isolation between producer and consumer fields to eliminate false sharing. Entry format: 4-byte length prefix + payload + alignment padding. Skip markers (UINT32_MAX) at the data region tail signal the consumer to wrap to offset 0, avoiding split reads across the wrap boundary. The header contains a magic number and ABI version for strict validation on open — no forward compatibility. API includes one-shot write/read, zero-copy begin_write/commit_write with deferred data writes (all mutations happen in commit_write behind a single release store), and peek/consume for zero-copy reads. Includes 26 gtest cases (initialization, validation, wrapping, skip markers, two-phase writes, and 4 multi-threaded stress tests) and Google Benchmark for throughput/latency measurement. Co-Authored-By: Claude <noreply@anthropic.com>
Extend the notification system to support shared-memory epochs and cross-process wake mechanisms across all three backends (io_uring, POSIX, IOCP). This is a prerequisite for the SHM carrier, which will map a shared memory region containing an epoch counter and use the notification signal/wait API for cross-process wakeup. Core change: add epoch_ptr indirection so notifications can point at either the inline epoch (local, zero behavioral change) or a caller-provided epoch in shared memory. All ~30 call sites mechanically change from ¬ification->epoch to notification->epoch_ptr. Add IREE_ASYNC_NOTIFICATION_FLAG_SHARED flag that controls three behavioral branches: destroy skips closing caller-owned primitives, Linux futex calls omit FUTEX_PRIVATE_FLAG for physical-page hashing, and macOS condvar initialization is skipped (process-local, useless cross-process). Per-platform details: - io_uring FUTEX mode: futex on shared address suffices, no extra fd. - io_uring EVENT mode: uses caller-provided eventfd for POLL_ADD+READ. - POSIX Linux: shared futex + caller eventfd/pipe for poll loop wake. - POSIX macOS: poll() on wake fd for sync wait (no futex, no condvar). - IOCP: RegisterWaitForSingleObject bridges caller Event to IOCP; WaitOnAddress/WakeByAddress work cross-process natively on Windows. New API: iree_async_notification_create_shared() with options struct specifying the epoch address, wake primitive, and signal primitive. Includes shared futex variants (iree_futex_wait_shared/wake_shared) that omit FUTEX_PRIVATE_FLAG on Linux for cross-process operation. CTS: 8 new test cases exercising shared epoch signal/query, sync/async wait, destroy-doesn't-close-primitives, multiple cycles, two notifications on one epoch, cross-notification sync wait, and timeout. Tests run on all backends including a new io_uring_no_futex variant that exercises the EVENT mode shared path. Co-Authored-By: Claude <noreply@anthropic.com>
Use epoch_ptr indirection instead of direct epoch field access for shared notifications whose epoch lives in shared memory. Fix comment about WaitOnAddress being per-process (virtual address keyed). Co-Authored-By: Claude <noreply@anthropic.com>
Most timer registrations use monotonically-increasing deadlines (connection timeouts, heartbeats, RPC deadlines), so comparing against the tail before walking turns the common case from O(n) to O(1). Also removes the now-dead walk-to-end branch since the fast-path handles all deadline >= tail cases. Co-Authored-By: Claude <noreply@anthropic.com>
Add a prev pointer to iree_async_iocp_carrier_t so that event wait carrier removal from the active_carriers list is O(1) instead of O(n). The completion dispatch path previously walked the entire list to find the predecessor; at scale (128+ registered event sources) this adds up. Co-Authored-By: Claude <noreply@anthropic.com>
Adds support for registering inline progress callbacks on proactors, enabling efficient polling-based progress notification without requiring full event loop wakeups. Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Replace polling-based event wait with NT native wait completion packet API for zero-overhead kernel-level event-to-IOCP association. Co-Authored-By: Claude <noreply@anthropic.com>
Add iree_shm_seal() and iree_shm_query_seals() to support making shared memory regions immutable after population (e.g. sealing model weights). Platform implementations: - Linux: kernel-level sealing via memfd F_SEAL_* (anonymous regions only). SEAL_WRITE uses munmap/seal/remap to avoid ASAN interference with mprotect-based approaches. On failure, mapping is either rolled back or fully torn down to prevent half-valid state. - Windows: VirtualProtect(PAGE_READONLY) defense-in-depth for WRITE seal, VirtualQuery for querying. SHRINK/GROW/SEAL are inherent no-ops. - macOS: returns IREE_STATUS_UNAVAILABLE (no kernel sealing support). Also fixes memfd_create to pass MFD_ALLOW_SEALING (was missing, causing all fcntl seal operations to silently fail) and makes iree_shm_map_fd seal-aware so that opening a write-sealed region maps it read-only. Co-Authored-By: Claude <noreply@anthropic.com>
Eliminates per-I/O heap allocation in steady state by recycling carrier structs through an atomic slist freelist. Carriers are pushed to the freelist on completion dispatch instead of being freed, and popped on the next submit instead of malloc. The pool grows on demand (no preseeding) and is drained at proactor destroy. Co-Authored-By: Claude <noreply@anthropic.com>
Extend iree_async_buffer_pool_t with create_shared/open_shared for cross-process zero-copy via shared memory. The atomic freelist (64-bit CAS, position-independent indices) is placed directly in caller-provided shared memory so both processes can independently acquire and release buffers from a single pool. Shared memory layout uses IREE_STRUCT_LAYOUT with cache-line-isolated sections: immutable header (magic/version/geometry), freelist packed state, and slot array. Header magic is written last as a commit step so openers never see a valid header with uninitialized freelist state. CTS test suite covers storage sizing, create/open lifecycle, header validation (magic, version, buffer_size, buffer_count, alignment, memory size), cross-handle freelist visibility, buffer data coherence across mappings, no-duplicate-index guarantees, and concurrent multi-thread stress. Co-Authored-By: Claude <noreply@anthropic.com>
Add iree_async_primitive_dup() and iree_async_primitive_close() for cross-process handle transfer. These are the async primitive layer's equivalents of dup()/close() (POSIX), DuplicateHandle/CloseHandle (Windows), and mach_port_mod_refs/deallocate (macOS). Needed by the SHM handshake to duplicate shared_wake signal primitives for IPC exchange, and by the factory to dup accepted socket primitives before passing them to the handshake (which takes ownership). Also adds primitive_test covering construction helpers (none, is_none, make, from_fd, from_win32_handle, from_mach_port), dup semantics (NONE fails, produces independent handle, multiple dups independent), and close semantics (NONE/NULL noop, sets to NONE). Co-Authored-By: Claude <noreply@anthropic.com>
Multi-process test infrastructure for IREE. A single test binary re-executes itself in different roles, with the launcher orchestrating spawn order, readiness synchronization, and result collection. The harness provides: - Role-based dispatch via --iree_test_role=<name> flags - Ready file protocol for ordered startup (server before client) - Shared temp directory for inter-process data exchange - Overall timeout with forcible kill on expiry - Exit code collection and summary reporting - IREE_TRACE_ZONE instrumentation on all substantial operations Platform support: - Linux: /proc/self/exe + posix_spawn - macOS: _NSGetExecutablePath + posix_spawn - Windows: GetModuleFileNameA + CreateProcessA Two usage patterns: - Gtest integration: link coordinated_test_main instead of gtest_main, register config with IREE_COORDINATED_TEST_REGISTER, call iree_coordinated_test_run() from TEST bodies - Standalone: call iree_coordinated_test_main() from main() First consumer will be SHM carrier cross-process tests. Co-Authored-By: Claude <noreply@anthropic.com>
Ensures data referenced by send SQEs submitted from recv callbacks is read before the submitting function's stack frame unwinds. Fixes io_uring proactor to copy send data inline when immediate submission is possible, avoiding use-after-return for stack-allocated buffers. Adds data_lifetime_test to the socket CTS to verify correct behavior. Co-Authored-By: Claude <noreply@anthropic.com>
Ensures that when an axis operation fails, the failure status is correctly propagated to all semaphores waiting on that axis value, rather than silently dropping the error. Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
writev() can return a short write (fewer bytes than requested) when the kernel send buffer is nearly full. The POSIX proactor was treating any positive return as a complete send, dropping unsent bytes. This is particularly visible on macOS with small SO_SNDBUF values where writev() routinely returns partial byte counts. Fix execute_send to detect short writes, advance the iovec past bytes already consumed, and return WOULD_BLOCK so the operation stays in the chain for POLLOUT-driven retry. bytes_sent accumulates across retries. Co-Authored-By: Claude <noreply@anthropic.com>
- Android: return UNAVAILABLE from iree_shm_create_named and iree_shm_open_named since bionic lacks shm_open/shm_unlink. Anonymous shared memory (memfd_create) is unaffected. - Coordinated tests: detect binfmt_misc interpreters (QEMU user-mode, Wine, FEX-Emu) by comparing /proc/self/exe against argv[0]. When they differ, use argv[0] for child re-execution since the kernel transparently invokes the interpreter via binfmt_misc. - SharedBufferPoolTest: replace memset with C++ value initialization to fix GCC -Werror=class-memaccess on types with default initializers. Co-Authored-By: Claude <noreply@anthropic.com>
ab77dc1 to
a91a278
Compare
- shm_test: exclude named SHM tests on Android (bionic lacks shm_open/shm_unlink). Anonymous SHM tests still run. - coordinated_test_test: add noriscv label to exclude under QEMU user-mode where the test cannot re-exec children without binfmt_misc. Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is the first half of the remote HAL transport stack, split out to land independently. Everything here is in
iree/async/,iree/base/, andiree/testing/— no net or HAL changes yet.Shared memory primitives
Adds
iree_shm_*for cross-platform shared memory create/open/close/map with handle passing support (memfd on Linux, shm_open on macOS, CreateFileMappingW on Windows). Includes memory sealing viairee_shm_seal()for making regions immutable after population — uses kernel-level F_SEAL_* on Linux, VirtualProtect defense-in-depth on Windows, unavailable on macOS.On top of that, a lock-free SPSC queue (
iree_spsc_queue_*) designed to operate on caller-provided memory so it can live in a shared memory region. Monotonically increasing 64-bit positions with acquire-release ordering, cache-line isolation between producer and consumer, skip markers for wrap-free reads. This is the data plane for the SHM carrier.Cross-process notification
Extends the notification system to support shared-memory epochs and cross-process wake across all three backends. The core change is an
epoch_ptrindirection so notifications can point at either the inline epoch (local, zero behavioral change) or a caller-provided epoch in shared memory.Per-platform wake mechanisms:
Shared buffer pool
Extends
iree_async_buffer_pool_twithcreate_shared/open_sharedfor cross-process zero-copy. The atomic freelist (64-bit CAS, position-independent indices) lives directly in shared memory so both processes can independently acquire and release buffers. Header magic is written last as a commit step so openers never see a valid header with uninitialized freelist state.Proactor performance improvements
NtAssociateWaitCompletionPacketfor zero-overhead kernel-level event-to-IOCP association.Bug fixes
data_lifetime_testin the socket CTS.Testing infrastructure
Adds a coordinated multi-process test harness (
iree/testing/). A single test binary re-executes itself in different roles, with the launcher orchestrating spawn order, readiness synchronization, and result collection. Supports Linux, macOS, and Windows. First consumer is the SHM carrier cross-process tests on the remote-hal branch.Also adds
iree_async_primitive_dup()/close()for cross-process handle transfer — the async primitive layer's equivalents of dup()/close(), DuplicateHandle/CloseHandle, mach_port_mod_refs.