-
Notifications
You must be signed in to change notification settings - Fork 417
Cancel tag RX entry once progress RX see any failure. #10997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Di Wang <[email protected]>
prov/tcp/src/xnet_progress.c
Outdated
xnet_ep_disable(ep, 0, NULL, 0); | ||
|
||
xnet_ep_tag_cancel(ep, -ret); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All cleanup should be done in xnet_ep_disable, look at flush_all_queues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked flush_all_queues(), which seems not trying to cleanup the tag_queue. Actually I had a question here,
So when sending an unexpected msg, if the send succeeds, though the following progress_rx failed, should we leave its posted tag recv entry in the tag queue, i.e. let some one else cancel it?
Or it should actively cancel the tag entry as in this patch? @soumagne
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tag matching only applies to RDM endpoints. For tcp, the tag matching is part of the shared receive context implementation. An error on a single connection should not impact the posted tag queue. Only the active tagged message in use should be reported in error, assuming that it's already been matched.
Tag queues are cleaned up when destroying the srx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification @shefty how about the updated patch? And the canceling is only for the tags with the same peer address with the disabling ep.
NB: if tag recv is not canceled/reported in time, the send op will not be completed in time, even though ep is already disabled due to the failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The connection can be torn down and reestablished. That's already done in the code today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The remote peer can also re-form the connection. An RDM endpoint simply has no way of knowing what the application is going to do. Failing outstanding receive operations isn't the right answer because of a potential hiccup in communication.
The ultimate issue is the application has selected an endpoint and operation which doesn't align with what it needs. That's why we need the RPC APIs, which provide enough context for the provider to make the correct decision. Because the proposed change here will break some apps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with the send getting stuck is something that should be handled. We would need to figure out what exactly happened there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarification. @shefty If I understand correctly, connection will only be re-established if there are new sending request and ep is not in the CONNECTED state.
In our current situation, when a remote peer terminates unexpectedly, the local socket may not immediately update its state, so the associated ep can remain in a CONNECTED state, which allows send() operations to succeeds, since they only copy the message to the local kernel's send buffer. However, subsequent receive operations (handled by progress_rx) will fail. This failure currently results only in the ep being disabled, rather than triggering a connection re-establishment. As a result, the original client request hangs there until there are new requests.
The RPC layer (cart/mercury) indeed track and retry these RPCs by issuing new requests, this retry mechanism is currently dependent on a timeout, which can introduce significant delays (potentially minutes).
An alternative approach would be for the lower I/O layer to signal a distinct error event to the RPC layer upon such a receive failure. This would enable the RPC layer to make a more immediate and informed decision regarding any pending receive operations (identified by tags) associated with that specific ep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, the connection will reform if it's not connected. The connection can be re-connected from either side.
If the socket is broken, I would expect send() to fail as well, depending on whether the kernel is aware of the failure yet. If the kernel accepted the send, there's really no way for user space to know if the send made it or not without a SW ack (e.g. delivery complete). For all the local side knows, the send could have reached the target. It's simply a race at which point the connection breaks.
We could introduce AV related events to report upwards. For example, FI_EUNEXP_COMM, unexpected communication error. It would be up the provider to determine what, if anything, constituted an unexpected communication error. For TCP, it could include a broken connection or error trying to establish one (unsure about the latter).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm If we introduce an new type of event(UNEXP_COMM), which probably do the similar thing as this patch (fail the related operation, resend the operation etc). Though this will be done in the PRC layer(mercury), so other applications should be ok with it. Thanks!
Signed-off-by: Di Wang <[email protected]>
entry = slist_remove_head(&tmp); | ||
slist_insert_head(entry, queue); | ||
|
||
xfer_entry = container_of(entry, struct xnet_xfer_entry, entry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't look like xfer_entry is used after here
No description provided.