CUDA context crash in libfabric backend progress thread when using GPU Direct RDMA

## Description

When using the libfabric backend with GPU Direct RDMA (`FI_EFA_USE_DEVICE_RDMA=1`), the progress thread crashes with a CUDA context error because `pthrCudaCtx_` is NULL when the thread starts.

## Environment

- NIXL 0.8.0
- AWS P5.48xlarge (H100 GPUs + EFA)
- TRT-LLM disaggregated inference
- `FI_EFA_USE_DEVICE_RDMA=1`

## Symptoms

```
CUDA error: invalid device context
Segmentation fault during fi_read with GPU memory
```

## Root Cause

In `src/plugins/libfabric/libfabric_backend.cpp`, the progress thread starts in the constructor BEFORE `registerMem()` is called. When the thread tries to access GPU memory via `fi_read()`, the CUDA context (`pthrCudaCtx_`) is NULL.

The UCX backend handles this correctly by restarting the thread when the context changes, but the libfabric backend does not.

## Proposed Fix

Apply CUDA context INSIDE the progress loop on every iteration, not just at thread start:

```cpp
void LibfabricBackend::progressThread() {
    while (!progress_thread_stop_.load()) {
#ifdef HAVE_CUDA
        if (cuda_addr_wa_) {
            vramApplyCtx();  // Apply context on EVERY iteration
        }
#endif
        // ... rest of progress loop
    }
}
```

## Workaround

None - requires source patch.

## Impact

This blocks GPU Direct RDMA for all libfabric users with CUDA memory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA context crash in libfabric backend progress thread when using GPU Direct RDMA #1157

Description

Environment

Symptoms

Root Cause

Proposed Fix

Workaround

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA context crash in libfabric backend progress thread when using GPU Direct RDMA #1157

Description

Description

Environment

Symptoms

Root Cause

Proposed Fix

Workaround

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions