Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Amplification of pod traffic #3604

Closed
Closed
@erik-stephens

Description

I think this is "works as designed". However, I think the side-effects are too great as the cluster grows.

Our peer connection limit is left at the default of 100 and we have 58 nodes. We could reduce that to help minimize the effect but that will only take us so far.

What you expected to happen?

Less broadcasting of traffic / links to not get saturated.

What happened?

Weave forwards traffic that it does not know the destination for to all of its peers. As the cluster grows, the amount of amplified traffic grows even greater.

How to reproduce it?

I think pod scheduling activity and maybe some periodic gossip/discovery phase triggers a peer to clear its table then re-populate it. Based on log output, we're seeing roughly 1-2 peers discovered per second. For a cluster of our size (and growing), that's a relatively large window for traffic to be amplified.

Anything else we need to know?

On bare-metal provisioned with kubeadm

Versions:

$ weave version
weave 2.5.1

$ docker version
Client:
 Version:           18.06.2-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        6d37f41
 Built:             Sun Feb 10 03:48:06 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.2-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       6d37f41
  Built:            Sun Feb 10 03:46:30 2019
  OS/Arch:          linux/amd64
  Experimental:     false

$ uname -a
Linux r26c4na 4.4.0-142-generic #168-Ubuntu SMP Wed Jan 16 21:00:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.2", GitCommit:"cff46ab41ff0bb44d8584413b598ad8360ec1def", GitTreeState:"clean", BuildDate:"2019-01-13T23:16:01Z", GoVersion:"go1.11.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.2", GitCommit:"cff46ab41ff0bb44d8584413b598ad8360ec1def", GitTreeState:"clean", BuildDate:"2019-01-10T23:28:14Z", GoVersion:"go1.11.4", Compiler:"gc", Platform:"linux/amd64"}

Logs:

Our biggest clue came from seeing pod traffic spike in unrelated pods during a large hourly workload in other pods, sometimes saturating the link.

peer-1: Discovered remote MAC ...
peer-2: Discovered remote MAC ...
peer-1: Captured frame from MAC ... associated with peer-2

Evidence that peers are receiving broadcasted traffic during the window where it populates its tables.

Network:

Our network seems fine: no packet loss, low latency, bonded interfaces.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions