Open
Description
Proposal: have the client generate a globally unique identifier, and use this for the lifetime of the proxy connection.
Summary of the connection dial and cleanup flows we have today:
- client sends short-lived dialID (a.k.a. "random")
- server passes dialID through to agent
- agent generates connectionID as part of dial flow
- agent mainly keys by connectionID
- state while pending is not well modeled
- server keeps two data structures
PendingDials
(by dialID)frontends
(by agentID+connectionID) after dial success
Problems: Since proxy connections may be closed/cancelled by either end, it is difficult to reliably clean up in all cases. In particular:
- Frontend closing leaks connection #403 is difficult to fix properly because PendingDial and frontends (established ProxyClientConnection) are not tracked by any identifier associated with a given serveRecvFrontend + readFrontendToChannel pair of goroutines.
- Even if we fix Frontend closing leaks connection #403, DIAL_CLS / DIAL_RSP race leading to connection leak #404 fundamental race remains (client sends DIAL_CLS to cancel a pending dial, but server can already be in pending succeeded state. When this happens the mapping from dialID has been lost, and cleanup is difficult. (I could imagine keeping a "last 30 seconds of dialID + connectionID" association, but it feels very hacky and there are other problems).
End goal is much simpler:
- CLOSE_REQ and DIAL_CLS semantics converge
- server can avoid the above state transition
- agent can more easily support client cancel
Migration
The hardest part seems to be a backward compatible migration (and eventual code deletion + cleanup). Care must also be taken to make sure HTTPConnect (tunnel.go) is given parity treatment.
I think the migration is tractable, and would like to hear what others think.