Commit 56cfcf8
committed
UCT/IB/RDMACM: Hold async block over rdma_get_cm_event
The rdmacm CM event handler called rdma_get_cm_event() outside the CM's
async block, then took the block only around uct_rdmacm_cm_process_event().
The ep destructor (uct_rdmacm_cm_ep_t cleanup) and other destroy sites
hold the same block when calling rdma_destroy_id(), so the synchronization
intent was to serialize them with the handler.
The pre-block window let a concurrent rdma_destroy_id() free the cm_id's
userspace tracking while the async thread was mid-lookup inside
rdma_get_cm_event(), producing a NULL deref at the internal
pthread_mutex_lock(&id_priv->mut) call. Observed as a SIGSEGV inside
librdmacm during sockaddr error/wireup-failure gtests under multi-threaded
workers where event delivery and ep teardown interleave more often.
Acquire the async block before rdma_get_cm_event() and release it on the
error/EAGAIN exit path, so the entire fetch + dispatch is serialized with
rdma_destroy_id() callers that hold the same block.
Signed-off-by: NirWolfer <nwolfer@nvidia.com>1 parent 2e11735 commit 56cfcf8
1 file changed
Lines changed: 6 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
813 | 813 | | |
814 | 814 | | |
815 | 815 | | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
816 | 821 | | |
817 | 822 | | |
818 | 823 | | |
| 824 | + | |
819 | 825 | | |
820 | 826 | | |
821 | 827 | | |
| |||
825 | 831 | | |
826 | 832 | | |
827 | 833 | | |
828 | | - | |
829 | 834 | | |
830 | 835 | | |
831 | 836 | | |
| |||
0 commit comments