Description
Problem:
It is difficult to gracefully close an s2n-tls tokio connection. I am observing this with TLS 1.3.
I am seeing the TLS connection cleanly close, but then one of the connections gets a failure when trying to close the TCP stream
Err(Os { code: 107, kind: NotConnected, message: "Transport endpoint is not connected" })
I think this makes sense, because generally only one peer will be able to actually shut down the TCP Stream?
1. client writes TCP shutdown
2. server receives TCP shutdown
3. server writes TCP shutdown <- ERROR, already shutdown
While we have tests for the edge case, I think the use of join! is causing us to “artificially” always hit the following race condition
1. client writes TCP shutdown
2. server writes TCP shutdown
3. both succeed
Where the ordering in the above list is stable because try_join will iteratively loop through the tasks to be polled.
You can see this behavior here, where simple_spawn_shutdown fails but simple_in_task_shutdown succeeds.
I've done a bit of investigation, and this doesn't seem to be unique to the s2n-tls bindings, but might be something standard to the tokio tcp stream implementation?
Switching the bindings to use poll_shutdown
instead of poll_shutdown_send
resolved this issue. I don't think that's a good long term solution, but an interesting data point.
Solution:
I am uncertain what the solution. Perhaps there might be a solution in having separate read/write handle shutdowns?
Requirements / Acceptance Criteria:
There should be a well-documented path to a clean shutdown
Out of scope:
Is there anything the solution will intentionally NOT address?
Activity