[SHM] Add huge page and NUMA placement support#23697
Merged
Conversation
c3a7a68 to
28271dc
Compare
Unify iree_shm_options_t and iree_numa_alloc_options_t into a single placement type used by both SHM and NUMA allocation paths. SHM create functions now accept iree_numa_alloc_options_t* (NULL for defaults); open functions drop options entirely since openers map existing pages whose backing store was determined at creation time. Platform implementations: - Linux: MFD_HUGETLB memfd with probe-mmap validation (MAP_POPULATE), MADV_HUGEPAGE for THP, mbind for NUMA. Graceful fallback cascade: explicit huge pages -> THP -> normal pages. Retry memfd_create without MFD_ALLOW_SEALING for kernel 4.14-4.15 compatibility. - Windows: SEC_LARGE_PAGES with silent fallback, CreateFileMappingNumaW for NUMA placement. Open paths try FILE_MAP_LARGE_PAGES first to support cross-process sharing of large-page sections. - macOS: No-op (no huge page or NUMA support on Apple Silicon). Also: convert iree_numa_alloc_options_t bool fields to a flag bitfield, embed placement options in iree_async_slab_options_t, reduce IREE_SHM_MAX_NAME_LENGTH to 30 for macOS PSHMNAMLEN portability, and improve iree_shm_seal documentation with thread-safety requirements and Windows process-local semantics. Co-Authored-By: Claude <noreply@anthropic.com>
All shm_open() calls now include O_CLOEXEC, and all dup() calls are replaced with fcntl(F_DUPFD_CLOEXEC, 0) which is atomic (no race window where a concurrent fork+exec could leak the fd). Previously, the memfd_create path correctly used MFD_CLOEXEC but the shm_open path (macOS anonymous, named create, named open) and both fd duplication sites (open_handle, handle_dup) left fds inheritable across execve. Co-Authored-By: Claude <noreply@anthropic.com>
Three fixes for Windows SHM paths found during cross-validated review: OpenFileMappingW for named regions created with SEC_LARGE_PAGES now tries FILE_MAP_ALL_ACCESS | FILE_MAP_LARGE_PAGES first, falling back to plain FILE_MAP_ALL_ACCESS. Without this, named large-page regions could not be opened by other processes. NUMA placement is now best-effort: when CreateFileMappingNumaW fails (invalid node, container restrictions, NUMA disabled), we fall back to CreateFileMappingW without NUMA preference. This matches the Linux behavior where mbind failures are silently ignored. VirtualProtect(PAGE_READONLY) on SEC_LARGE_PAGES sections now returns IREE_STATUS_UNAVAILABLE instead of a generic error. This uses the same contract as macOS sealing (which is entirely unsupported), allowing callers implementing defense-in-depth to check and proceed. Co-Authored-By: Claude <noreply@anthropic.com>
- Windows: fall back from MapViewOfFileExNuma to MapViewOfFile when NUMA placement fails, completing the best-effort NUMA pattern at the view mapping level (creation-level fallback was already in place). - Header: document that Windows large-page sections (SEC_LARGE_PAGES) do not support VirtualProtect protection changes, so sealing returns IREE_STATUS_UNAVAILABLE. - Tests: add boundary tests for IREE_SHM_MAX_NAME_LENGTH (too-long name rejected, max-length name accepted). Co-Authored-By: Claude <noreply@anthropic.com>
Fix zero-initialized slab_options pinning to NUMA node 0 instead of "no preference." The refactoring to use options.placement directly caused node_id=0 (the zero-init value) to be interpreted as "pin to node 0" rather than the previous behavior of IREE_NUMA_NODE_ANY. On ARM64 CI containers with restricted RLIMIT_MEMLOCK, this changed the kernel's page-pinning behavior and caused io_uring fixed buffer registration to fail with ENOMEM. Also fix MultipleSendSlabRegistrations to handle the ENOMEM graceful fallback: when RLIMIT_MEMLOCK is too low, both registrations succeed (the proactor falls back to copy-based I/O) but neither gets real buffer indices. The test now GTEST_SKIPs instead of asserting that two -1 indices are distinct. Co-Authored-By: Claude <noreply@anthropic.com>
28271dc to
7236950
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Unify iree_shm_options_t and iree_numa_alloc_options_t into a single placement type used by both SHM and NUMA allocation paths. SHM create functions now accept iree_numa_alloc_options_t* (NULL for defaults); open functions drop options entirely since openers map existing pages whose backing store was determined at creation time.
Platform implementations:
MADV_HUGEPAGE for THP, mbind for NUMA. Graceful fallback cascade:
explicit huge pages -> THP -> normal pages. Retry memfd_create
without MFD_ALLOW_SEALING for kernel 4.14-4.15 compatibility.
for NUMA placement. Open paths try FILE_MAP_LARGE_PAGES first to
support cross-process sharing of large-page sections.
Also: convert iree_numa_alloc_options_t bool fields to a flag bitfield, embed placement options in iree_async_slab_options_t, reduce IREE_SHM_MAX_NAME_LENGTH to 30 for macOS PSHMNAMLEN portability, and
improve iree_shm_seal documentation with thread-safety requirements and Windows process-local semantics.