Commit 58fec6f
authored
[rdma] drain CQ after moving into RESET state (#3261)
Otherwise there might segfault due to the race below:
```txt
Socket::OnInputEvent() |
`-- ProcessEvent (bthread) |
|
[ bthread queueed ] | QP error -> SetFailed -> HC -> WaitAndReset()
| Reset() -> _sbuf.clear()
| CheckHealth() -> Revive()
|
| Socket is now Addressable!
RdmaEndpoint:PollCq() |
Socket::Address() OK! |
RdmaEndpoint:HandleCompletion()
_sbuf[_sq_sent++].clear() <= BOOM! CQ is not drained but _sbuf is cleared.
```
Another possible fix is to add a _generation_ field in RdmaEndpoint, such that:
- each RdmaEndpoint::Reset() will advance the _generation_ by 1;
- the RdmaEndpoint::PollCq(m, orig_gen) will need to compare the _generation_.
But it will contaminate existing interface, and we need to drain CQ anyway.
Signed-off-by: David Lee <live4thee@gmail.com>1 parent 08d95ac commit 58fec6f
1 file changed
Lines changed: 34 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1334 | 1334 | | |
1335 | 1335 | | |
1336 | 1336 | | |
| 1337 | + | |
| 1338 | + | |
| 1339 | + | |
| 1340 | + | |
| 1341 | + | |
| 1342 | + | |
| 1343 | + | |
| 1344 | + | |
| 1345 | + | |
| 1346 | + | |
| 1347 | + | |
| 1348 | + | |
| 1349 | + | |
| 1350 | + | |
| 1351 | + | |
1337 | 1352 | | |
1338 | 1353 | | |
1339 | 1354 | | |
| |||
1360 | 1375 | | |
1361 | 1376 | | |
1362 | 1377 | | |
| 1378 | + | |
1363 | 1379 | | |
1364 | 1380 | | |
1365 | 1381 | | |
| |||
1403 | 1419 | | |
1404 | 1420 | | |
1405 | 1421 | | |
| 1422 | + | |
| 1423 | + | |
| 1424 | + | |
| 1425 | + | |
| 1426 | + | |
| 1427 | + | |
| 1428 | + | |
| 1429 | + | |
| 1430 | + | |
| 1431 | + | |
| 1432 | + | |
| 1433 | + | |
| 1434 | + | |
| 1435 | + | |
| 1436 | + | |
| 1437 | + | |
| 1438 | + | |
| 1439 | + | |
1406 | 1440 | | |
1407 | 1441 | | |
1408 | 1442 | | |
| |||
0 commit comments