Open
Description
Description
<ACK>
dropped silently. A compatibility issue?
What we expects:
sequenceDiagram
participant c as curl
participant s as Server(gVisor)
Note over s: [LISTEN]
c ->> s: SYN (TSopt)
s ->> c: SYN+ACK (TSopt)
Note over s: [SYN-RCVD]
c ->> s: ACK (TSopt)
Note over s: [ESTAB]
What happened:
sequenceDiagram
participant c as curl
participant s as Server(gVisor)
Note over s: [LISTEN]
c ->> s: SYN (TSopt)
s ->> c: SYN+ACK (TSopt)
Note over s: [SYN-RCVD]
c -x s: ACK (no TSopt)
Note over s: retransmission back-off 1s...
s ->> c: SYN+ACK (TSopt, retransmits)
c -x s: ACK (no TSopt, dup)
Note over s: retransmission back-off 2s...
s ->> c: SYN+ACK (TSopt, retransmits)
c -x s: ACK (no TSopt, dup)
Note over s: retransmission back-off 4s...
- Client (e.g. curl) established a TCP connection one-side (
connect
syscall completed). However, Server enabled gVisor is still in SYN-RCVD state, waiting for the<ACK>
(But we have confirmed the packet has arrived into the gVisor network namespace). - We found the problem only happens when the client side is using old versions of Linux kernel (e.g. 3.x ~ 4.14.x). New versions (e.g. 5.4.x ~ latest) of Linux kernel establish two-way connections with gVisor Server with no problem. It turns out old kernels does not fully comply to the RFC 7323 "TCP Extensions for High Performance" Section 3.2 - Timestamp Options says Once TSopt has been successfully negotiated, that is both
<SYN>
and<SYN,ACK>
contain TSopt, the TSopt MUST be sent in every non-<RST>
segment for the duration of the connection. - On the other hand, gVisor dropped the segments silently. We think it probably because RFC 7323 also states that If a non-
<RST>
segment is received without a TSopt, a TCP SHOULD silently drop the segment. Native Linux TCP/IP doesn't do this, no matter old or new version.
Workarounds:
- Upgrading clients to new OS, new kernel (sounds crazy but we did do this, because all those our "clients" are micro-service containers running over few host server, which are easy to migrate).
- Or, disabling the TCP Timestamp feature on client-side completely
sysctl net.ipv4.tcp_timestamp=0
. - Or, using host network mode.
Q1: Is this an intentional design decision?
But only partially. Malfunctions?
- Yet, suppose that is working-as-designed. The last weird thing is, even gVisor silently drops those non-standard ACK-of-SYN segments, the connections seem to be stuck in the backlog queue of
accpet
syscall and forever. It leads to a very strange phenomenon. Let's say the server calledlisten(s, 5)
. Client will keep failing until, suddenly, the 6th attempt of connect succeeded. It becomes completely normal, once and for all, as for this listener. - We soon realise it is because SYN-cookies kick in. The backlog becomes full, and it switched to SYN-cookies. Starting from now, TCP Options in every handshake will be ignored, and so is the TSopt things.
- Even weirder, after leaving the server alone without any more request a whole night, we thought the half-established connections should have been all timed out and discarded, and new coming connection will be blocked again, but it didn't. The first 5 (certainly lost) connections stuck forever, forcing all the other connections use SYN-cookie and possibly degrade the performance.
Q2: Is there something missing when half-established connection (SYN+ACK retransmission) failed? For example, removing the failed endpoint from the queue?
Environment is the same as #11535