prov/shm: new shm architecture (updated) by aingerson · Pull Request #11877 · ofiwg/libfabric

aingerson · 2026-02-10T19:51:47Z

Reopening new shm architecture PR with rebase and fixes for CI results

j-xiong · 2026-02-11T17:41:53Z

bot:aws:retest

aingerson · 2026-02-11T19:11:50Z

Intel CI failures are lnx bugs fixed in #11882

j-xiong · 2026-02-12T23:15:49Z

bot:aws:retest

shijin-aws · 2026-02-12T23:31:28Z

I think the AWS CI failure is real: a single node fabtests failed

--------------------------------- Captured Log ---------------------------------

--------------------------------- Captured Out ---------------------------------
No CUDA memory mark, skipping validation

server_command: ssh -n -o StrictHostKeyChecking=no -o ConnectTimeout=30 -o BatchMode=yes 172.31.35.68 'timeout 360 /bin/bash --login -c '"'"'FI_LOG_LEVEL=warn /home/ec2-user/PortaFiducia/build/libraries/libfabric/pr11877-undebug/install/fabtests/bin/fi_efa_implicit_av_test -L -c 5 -S 1024 -X -I 5 -f efa -v -S all -p efa -E'"'"''

client_command: ssh -n -o StrictHostKeyChecking=no -o ConnectTimeout=30 -o BatchMode=yes 172.31.35.68 'timeout 360 /bin/bash --login -c '"'"'FI_LOG_LEVEL=warn /home/ec2-user/PortaFiducia/build/libraries/libfabric/pr11877-undebug/install/fabtests/bin/fi_efa_implicit_av_test -L -c 5 -S 1024 -X -I 5 -f efa -v -S all -p efa -E 172.31.35.68'"'"''
client_stdout:
[info] fabtests:prov/efa/src/efa_implicit_av_test.c:484: Running test for message size: 1024

[info] fabtests:prov/efa/src/efa_implicit_av_test.c:253: Client: Step 1 - Post receive buffers

[info] fabtests:prov/efa/src/efa_implicit_av_test.c:269: Client: Initial sync

[info] fabtests:prov/efa/src/efa_implicit_av_test.c:273: Implicit AV. Only server inserts client's address

[info] fabtests:prov/efa/src/efa_implicit_av_test.c:292: Client: Sync after send complete

[info] fabtests:prov/efa/src/efa_implicit_av_test.c:301: Client: Waiting for messages from 5 server endpoints


client returncode: 124
server_stdout:
[info] fabtests:prov/efa/src/efa_implicit_av_test.c:484: Running test for message size: 1024

[info] fabtests:prov/efa/src/efa_implicit_av_test.c:330: Server: Creating 5 endpoints

[info] fabtests:prov/efa/src/efa_implicit_av_test.c:347: Server: Initial sync

[info] fabtests:prov/efa/src/efa_implicit_av_test.c:352: Implicit AV. Only sender inserts receiver's address

[info] fabtests:prov/efa/src/efa_implicit_av_test.c:368: Server: Step 1 - Send messages from all endpoints


server returncode: 124

@sunkuamzn may have more insights on what this test checks and how that can relate to shm

ZE IPC code protocol was updated to remove dependency on Unix socket code - can be removed Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>

Remove dates from Intel copyright (no longer recommended) Remove unneeded headers from .c and .h files Fix ifdef name for headers Put headers in "" or <> depending on location Organize headers in the following order: - corresponding .h file - other shm headers - ofi headers - system headers - Within each group, organize alphabetically Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>

Add helper functions to freestack implementation: - smr_freestack_avail: return the number of available elements - smr_freestack_get_index: return the index number of the given element Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>

Allow use of mr copy function using direction Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>

Add function to return minimum of 3 values Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>

xpmem capability can only have 2 settings - on or off. Turn into bool for simplicity Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>

create function needs to align the allocation with the cache line size Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>

Add function definition to be able to initialize fields in in the queue. This function is eager so the entry is already initialized when it gets assigned to the caller and gets pre-emptively re-initialized on release back into the queue. This can help with caching if initialization is more effective done by the owner instead of peers Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>

Replacement of shm protocols with new architecture. Significant changes: - Turn response queue into return queue for local commands. Inline commands are still receive side. All commands have an inline option but a common ptr to the command being used for remote commands. These commands have to be returned to the sender but the receive side can hold onto them as long as needed for the lifetime of the message - shm has self and peer caps for each p2p interface (right now just CMA and xpmem). The support for each of these interfaces is saved in separate fields which causes a lot of wasted memory and is confusing. This merges these into two fields (one for self and one for peer) which holds the information for all p2p interfaces and is accessed by the P2P type enums. CMA also needs a flag to know wether CMA support has been queried yet or not. - Move some shm fields around for alignment - Simplifies access to the map to remove need for container - There is a 1:1 relationship with the av and map so just reuse the util av lock for access to the map as well. This requires some reorganizing of the locking semantics - There is nothing in smr_fabric. Remove and just use the util_fabric directly - Just like on the send side, make the progress functions be an array of function pointers accessible by the command proto. This cleans up the parameters of the progress calls and streamlines the calls - Merge tx and pend entries for simple management of pending operations - Redefinition of cmd and header for simplicty and easier reading. Also removes and adds fields for new architecture - Refactor async ipc list and turn it into a generic async list to track asynchronous copies which can be used for any accelerator (GPU or DSA) that copies locally asynchronously. - Cleanup naming and organization for readibility. Shorten some names to help with line length and organization - Remove unused and non-performant mmap protocol (and sar_threshold environment variable which was only used for that protocol) - Fix weird header dependency smr_util.c->smr.h->smr_util.h so that smr_util.c is only dependent on smr_util.h and is isolated to solely shm region and protocol definitions. This separates the shm utilities from being dependent on the provider leaving the door open for reuse of the shm utilities if needed Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>

In order to support unlimited unexpected messaging, add a flag SMR_BUFFER_RECV for the sender to let the receiver know that resources are limited and the whole message should get buffered on the target. This allows the command to be immediately returned to the sender so that the sender is never blocked due to unexpected messages at the target. Buffering unexpected messages hurts performance so the default is to wait until only a single command is left before requesting buffering, but an environment variable is also added to toggle this for either debugging purposes or workarounds. Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>

The shm provider will not be able to complete an unexpected send in the injcet protocol. Do not test 1024 and above. Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>

aingerson force-pushed the shm branch from 8fb2eaa to c209c63 Compare February 12, 2026 00:29

aingerson added 11 commits February 19, 2026 15:10

prov/shm: remove socket code, no longer needed

2686f43

ZE IPC code protocol was updated to remove dependency on Unix socket code - can be removed Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>

include/ofi_hmem: make ofi_copy_mr_iov non-static

feb58e5

Allow use of mr copy function using direction Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>

include/ofi.h: add MIN3 function

77807fb

Add function to return minimum of 3 values Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>

include/ofi_xpmem: change cap into bool

85a7ccc

xpmem capability can only have 2 settings - on or off. Turn into bool for simplicity Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>

include/ofi_atomic_queue: fix create function

0cb5737

create function needs to align the allocation with the cache line size Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>

fabtests/efa_implicit_av_test: skip shm test for inject sizes

03ab439

The shm provider will not be able to complete an unexpected send in the injcet protocol. Do not test 1024 and above. Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>

aingerson force-pushed the shm branch from c209c63 to 03ab439 Compare February 20, 2026 19:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

prov/shm: new shm architecture (updated)#11877

prov/shm: new shm architecture (updated)#11877
aingerson wants to merge 11 commits intoofiwg:mainfrom
aingerson:shm

aingerson commented Feb 10, 2026

Uh oh!

j-xiong commented Feb 11, 2026

Uh oh!

aingerson commented Feb 11, 2026

Uh oh!

j-xiong commented Feb 12, 2026

Uh oh!

shijin-aws commented Feb 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

aingerson commented Feb 10, 2026

Uh oh!

j-xiong commented Feb 11, 2026

Uh oh!

aingerson commented Feb 11, 2026

Uh oh!

j-xiong commented Feb 12, 2026

Uh oh!

shijin-aws commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shijin-aws commented Feb 12, 2026 •

edited

Loading