Skip to content

netstack: possible compat issues or malfunctions of non-standard TCP Timestamp Options #11536

Open
@LionNatsu

Description

@LionNatsu

Description

<ACK> dropped silently. A compatibility issue?

What we expects:

sequenceDiagram
	participant c as curl
	participant s as Server(gVisor)
	Note over s: [LISTEN]
	c ->> s: SYN (TSopt)
	s ->> c: SYN+ACK (TSopt)
	Note over s: [SYN-RCVD]
	c ->> s: ACK (TSopt)
	Note over s: [ESTAB]
Loading

What happened:

sequenceDiagram
	participant c as curl
	participant s as Server(gVisor)
	Note over s: [LISTEN]
	c ->> s: SYN (TSopt)
	s ->> c: SYN+ACK (TSopt)
	Note over s: [SYN-RCVD]
	c -x s: ACK (no TSopt)
	Note over s: retransmission back-off 1s...
	s ->> c: SYN+ACK (TSopt, retransmits)
	c -x s: ACK (no TSopt, dup)
	Note over s: retransmission back-off 2s...
	s ->> c: SYN+ACK (TSopt, retransmits)
	c -x s: ACK (no TSopt, dup)
	Note over s: retransmission back-off 4s...
Loading
  1. Client (e.g. curl) established a TCP connection one-side (connect syscall completed). However, Server enabled gVisor is still in SYN-RCVD state, waiting for the <ACK> (But we have confirmed the packet has arrived into the gVisor network namespace).
  2. We found the problem only happens when the client side is using old versions of Linux kernel (e.g. 3.x ~ 4.14.x). New versions (e.g. 5.4.x ~ latest) of Linux kernel establish two-way connections with gVisor Server with no problem. It turns out old kernels does not fully comply to the RFC 7323 "TCP Extensions for High Performance" Section 3.2 - Timestamp Options says Once TSopt has been successfully negotiated, that is both <SYN> and <SYN,ACK> contain TSopt, the TSopt MUST be sent in every non-<RST> segment for the duration of the connection.
  3. On the other hand, gVisor dropped the segments silently. We think it probably because RFC 7323 also states that If a non-<RST> segment is received without a TSopt, a TCP SHOULD silently drop the segment. Native Linux TCP/IP doesn't do this, no matter old or new version.

Workarounds:

  • Upgrading clients to new OS, new kernel (sounds crazy but we did do this, because all those our "clients" are micro-service containers running over few host server, which are easy to migrate).
  • Or, disabling the TCP Timestamp feature on client-side completely sysctl net.ipv4.tcp_timestamp=0.
  • Or, using host network mode.

Q1: Is this an intentional design decision?

But only partially. Malfunctions?

  1. Yet, suppose that is working-as-designed. The last weird thing is, even gVisor silently drops those non-standard ACK-of-SYN segments, the connections seem to be stuck in the backlog queue of accpet syscall and forever. It leads to a very strange phenomenon. Let's say the server called listen(s, 5). Client will keep failing until, suddenly, the 6th attempt of connect succeeded. It becomes completely normal, once and for all, as for this listener.
  2. We soon realise it is because SYN-cookies kick in. The backlog becomes full, and it switched to SYN-cookies. Starting from now, TCP Options in every handshake will be ignored, and so is the TSopt things.
  3. Even weirder, after leaving the server alone without any more request a whole night, we thought the half-established connections should have been all timed out and discarded, and new coming connection will be blocked again, but it didn't. The first 5 (certainly lost) connections stuck forever, forcing all the other connections use SYN-cookie and possibly degrade the performance.

Q2: Is there something missing when half-established connection (SYN+ACK retransmission) failed? For example, removing the failed endpoint from the queue?

Environment is the same as #11535

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions