Skip to content

Commit 58fec6f

Browse files
authored
[rdma] drain CQ after moving into RESET state (#3261)
Otherwise there might segfault due to the race below: ```txt Socket::OnInputEvent() | `-- ProcessEvent (bthread) | | [ bthread queueed ] | QP error -> SetFailed -> HC -> WaitAndReset() | Reset() -> _sbuf.clear() | CheckHealth() -> Revive() | | Socket is now Addressable! RdmaEndpoint:PollCq() | Socket::Address() OK! | RdmaEndpoint:HandleCompletion() _sbuf[_sq_sent++].clear() <= BOOM! CQ is not drained but _sbuf is cleared. ``` Another possible fix is to add a _generation_ field in RdmaEndpoint, such that: - each RdmaEndpoint::Reset() will advance the _generation_ by 1; - the RdmaEndpoint::PollCq(m, orig_gen) will need to compare the _generation_. But it will contaminate existing interface, and we need to drain CQ anyway. Signed-off-by: David Lee <live4thee@gmail.com>
1 parent 08d95ac commit 58fec6f

1 file changed

Lines changed: 34 additions & 0 deletions

File tree

src/brpc/rdma/rdma_endpoint.cpp

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1334,6 +1334,21 @@ static void DeallocateCq(ibv_cq* cq) {
13341334
LOG_IF(WARNING, 0 != err) << "Fail to destroy CQ: " << berror(err);
13351335
}
13361336

1337+
static int DrainCq(ibv_cq* cq) {
1338+
if (NULL == cq) {
1339+
return 0;
1340+
}
1341+
1342+
ibv_wc wc;
1343+
int ret;
1344+
do {
1345+
ret = ibv_poll_cq(cq, 1, &wc);
1346+
} while (ret > 0);
1347+
1348+
LOG_IF(ERROR, ret < 0) << "drain CQ failed: " << ret;
1349+
return ret;
1350+
}
1351+
13371352
void RdmaEndpoint::DeallocateResources() {
13381353
if (!_resource) {
13391354
return;
@@ -1360,6 +1375,7 @@ void RdmaEndpoint::DeallocateResources() {
13601375
}
13611376

13621377
bool remove_consumer = true;
1378+
_reclaim:
13631379
if (!move_to_rdma_resource_list) {
13641380
if (NULL != _resource->qp) {
13651381
int err = IbvDestroyQp(_resource->qp);
@@ -1403,6 +1419,24 @@ void RdmaEndpoint::DeallocateResources() {
14031419
}
14041420

14051421
if (move_to_rdma_resource_list) {
1422+
// When a QP is moved to the RESET state, all associated send and
1423+
// receive queues are flushed, meaning any outstanding WRs are effectively
1424+
// abandoned by the hardware.
1425+
//
1426+
// However, the CQ associated with that QP is *not* cleared automatically,
1427+
// meaning that it will still contain entries for WRs that completed before
1428+
// the reset.
1429+
//
1430+
// The application should finish polling the CQ to remove these obsolete
1431+
// entries before reusing the QP.
1432+
int ret = DrainCq(_resource->polling_cq);
1433+
ret += DrainCq(_resource->send_cq);
1434+
ret += DrainCq(_resource->recv_cq);
1435+
if (ret < 0) {
1436+
move_to_rdma_resource_list = false;
1437+
goto _reclaim;
1438+
}
1439+
14061440
BAIDU_SCOPED_LOCK(*g_rdma_resource_mutex);
14071441
_resource->next = g_rdma_resource_list;
14081442
g_rdma_resource_list = _resource;

0 commit comments

Comments
 (0)