-
Notifications
You must be signed in to change notification settings - Fork 417
Enable TCP keepalive during connreq processing, otherwise #11058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change looks good to me, aside from declaring the enable call so that it's accessible to the cm
Please squash the commits. |
sure |
prov/tcp/src/xnet_cm.c
Outdated
@@ -203,9 +203,12 @@ void xnet_req_done(struct xnet_ep *ep) | |||
ret = xnet_req_done_internal(ep); | |||
if (ret) | |||
goto disable; | |||
|
|||
xnet_disable_keepalive(ep); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this can move to near the top of the function, so it doesn't need a separate call in the disable path below (which probably isn't needed anyway).
prov/tcp/src/xnet_ep.c
Outdated
if (ret) { | ||
FI_WARN(&xnet_prov, FI_LOG_EP_CTRL, "%p set tcp keepalive failure:%d\n", | ||
ep, ret); | ||
return ofi_sockerr() ? -ofi_sockerr() : -FI_EINVAL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xnet_enable_keepalive() already converts to the correct errno. Just pass through.
For the commit message, please add a short subject line, such as: "prov/tcp: Enable keepalive during CM exchange". This should be the top line of the commit. |
Enable TCP keepalive during connreq processing, otherwise if the remote peer is restarted between getting connreq and replying, it may hang there to wait for the connreq reply. Since the ep state(req_done) will not allow sending new reqs, and socket state will not be reset, i.e. progress will not be able to detect the disconnection. Signed-off-by: Di Wang <[email protected]>
int optval = 1; | ||
int idle_time = 5; | ||
int keep_intvl = 2; | ||
int keep_cnt = 2; | ||
int ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to make these parameters configurable via environment variables? I implemented a similar feature for the socket provider about 8 years ago in this commit.
Also worth noting: the TCP provider now supports the FI_GET_FD
flag with fi_control()
, which enables applications to fine-tune the TCP socket for a given endpoint: #11003
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to make these parameters configurable via environment variables?
I thought about this, though this will only be used during CM exchange, maybe using fixed value is good enough for now. Though I may miss some use cases.
@ooststep Could you update your review? |
if the remote peer is restarted between getting connreq and replying, it may hang there to wait for the connreq reply. Since the ep state(req_done) will not allow sending new reqs, and socket state will not be reset, i.e. progress will not be able to detect the disconnection.