Skip to content

updating IP_COOLDOWN_PERIOD from 30 seconds to 130 seconds to ensure SYN_SENT (120 seconds) conntrack timer expires #3636

Open
Mohijeet wants to merge 1 commit intoaws:masterfrom
Mohijeet:master
Open

updating IP_COOLDOWN_PERIOD from 30 seconds to 130 seconds to ensure SYN_SENT (120 seconds) conntrack timer expires #3636
Mohijeet wants to merge 1 commit intoaws:masterfrom
Mohijeet:master

Conversation

@Mohijeet
Copy link
Copy Markdown

What type of PR is this?
improvement

Which issue does this PR fix?:
In-flight TCP connections to a deleted pod's IP can be silently hijacked if a new pod is assigned the same IP within the conntrack SYN_SENT expiry window (120 seconds on most Linux kernels).
issue

What does this PR do / Why do we need it?:

Problem:

  • When a pod is deleted, any in-flight TCP requests that were initiated just before deletion will create a conntrack entry in SYN_SENT state for the now-gone pod IP. On most Linux kernels, SYN_SENT conntrack entries expire after 120 seconds.
  • If a new pod is assigned the same IP address within this 120-second window, the stale conntrack entry becomes a valid TCP flow reference.

This is dangerous because:

  • When a TCP SYN retry arrives with the same srcIP:srcPort → dstIP:dstPort tuple, the kernel looks up the existing conntrack entry instead of creating a new one
  • The source IP here is the originating service and the destination IP is the virtual Kubernetes service IP — there is no guarantee that the source port will differ across retries
  • Even though kube-proxy may have already updated iptables/ipvs rules for the new pod, conntrack takes precedence over iptables for established/tracked flows
  • The new (wrong) pod will receive and reply to the SYN, completing a TCP handshake silently, establishing a valid connection to the wrong pod
  • tcp_syn_retries can extend this exposure window up to 130 seconds

Fix:
Increase the default IP cooldown period from 30 seconds to 130 seconds. This ensures:

All SYN_SENT conntrack entries from the deleted pod's IP have fully expired (120 seconds on most Linux OS)
All tcp_syn_retries for in-flight connections have been exhausted (up to 130 seconds)

Only two code changes are made:

Default value in getCooldownPeriod() changed from 30 to 130
Updated the envIPCooldownPeriod constant comment to document the new default and reasoning

The IP_COOLDOWN_PERIOD environment variable continues to work as before, allowing operators to override this value if needed.

Will this PR introduce any new dependencies?:
No.

Will this break upgrades or downgrades? Has updating a running cluster been tested?:
No breaking changes. The cooldown period increase is a conservative safety improvement. On upgrade, existing clusters will automatically benefit from the longer cooldown. Operators who need a shorter cooldown can override via the IP_COOLDOWN_PERIOD environment variable.

Does this change require updates to the CNI daemonset config files to work?:
No. Works with a kubectl patch of the image tag. No config changes required.

Does this PR introduce any user-facing change?:
Yes. The default IP cooldown period after pod deletion is increased from 30 seconds to 130 seconds. This may slightly reduce IP churn rate in high-pod-turnover clusters but prevents silent TCP connection hijacking via stale conntrack entries.

@Mohijeet Mohijeet requested a review from a team as a code owner March 31, 2026 18:22
@oliviassss
Copy link
Copy Markdown
Contributor

@Mohijeet, thanks for the contribution. But increasing the default to 130s would significantly reduce IP churn rate in high-pod-turnover clusters, potentially causing IP starvation issues. I don't think we should set 130s as default value.

@slice-mohijeet
Copy link
Copy Markdown

What is the recommended fix for this behavior? Should the VPC CNI explicitly configure the SYN_SENT timeout or retry parameters during startup to override the default Linux kernel settings?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants