DNS lookup timeouts due to races in conntrack


## What happened?
We are experiencing random 5 second DNS timeouts in our kubernetes cluster.   

## How to reproduce it?
It is reproducible by requesting just about any in-cluster service, and observing that periodically ( in our case, 1 out of 50 or 100 times), we get a 5 second delay. It always happens in DNS lookup.  

## Anything else we need to know?
We believe this is a result of a kernel level SNAT race condition that is described quite well here:

https://tech.xing.com/a-reason-for-unexplained-connection-timeouts-on-kubernetes-docker-abd041cf7e02

The problem happens with non-weave CNI implementations, and is (ironically) not even a weave issue really. However, its becomes a weave issue, because the solution is to set a flag on the masquerading rules that are created, which are not in anyone's control except for weave.

What we need is the ability to apply the NF_NAT_RANGE_PROTO_RANDOM_FULLY  flag on the masquerading rules that weave sets up. IN the above post, Flannel was in use, and the fix was there instead.

We searched for this issue, and didnt see that anyone had asked for this.  We're also unaware of any settings that allow setting this flag today-- if that's possible, please let us know. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DNS lookup timeouts due to races in conntrack #3287

What happened?

How to reproduce it?

Anything else we need to know?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DNS lookup timeouts due to races in conntrack #3287

Description

What happened?

How to reproduce it?

Anything else we need to know?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions