Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Weave getting cross-cluster node discovery #3621

Closed
@sidharthsurana

Description

What you expected to happen?

Deploying multiple k8s cluster with weave on the same network should not create any interference between the different k8s cluster

What happened?

We have 2 identical k8s clusters v1.11.3 deployed with weave version 2.40 as the CNI
We are seeing the weave node discovery going cross-cluster, which in turn messes the container networking

How to reproduce it?

Deploy 2 k8s cluster with nodes on the same network and deploy weave CNI on them.

Anything else we need to know?

These identical clusters are deployed using cluster-api-provider-vsphere.
The master node has a single physical nic
The worker nodes have 3 physical nica
All the nodes are running VMware Photon OS 2.0
Notice that the cluster-1 nodes have the IP from 10.172.142.48-10.172.142.52 where as the cluster-2 nodes have IP 10.172.142.25-10.172.142.29

Please see the following gists that have all the detailed info from the 2 k8s clusters

  1. https://gist.github.com/sidharthsurana/c681b8000b0e491268b32081d6c677d6 for cluster-1 debug info
  2. https://gist.github.com/sidharthsurana/4f0d91cb4a749f4599b3b0037c27475c for cluster-2 debug info

Versions:

$ weave version
weave 2.4.0
$ docker version
Client:
 Version:      17.06.0-ce
 API version:  1.30
 Go version:   go1.9.4
 Git commit:   02c1d87
 Built:        Wed Feb 13 09:36:56 2019
 OS/Arch:      linux/amd64

Server:
 Version:      17.06.0-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.9.4
 Git commit:   02c1d87
 Built:        Wed Feb 13 09:37:49 2019
 OS/Arch:      linux/amd64
 Experimental: false
$ uname -a
Linux worker-l4kf662nc5 4.9.140-5.ph2 #1-photon SMP Sat Jan 19 14:33:46 UTC 2019 x86_64 Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz GenuineIntel GNU/Linux
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T18:02:47Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T17:53:03Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

Logs:

Part of the logs from a weave pod in the cluster-1
Note the references to IPs 10.172.142.2x which is from the cluster-2 node

INFO: 2019/03/21 16:24:13.910873 ->[10.172.142.25:40161] connection accepted
INFO: 2019/03/21 16:24:13.910908 ->[10.172.142.51:37697] connection accepted
INFO: 2019/03/21 16:24:13.911139 ->[10.172.142.27:41689] connection accepted
INFO: 2019/03/21 16:24:13.911171 ->[10.172.142.28:35471] connection accepted
INFO: 2019/03/21 16:24:13.911729 ->[10.172.142.51:37697|9a:f6:f8:cd:f7:37(worker-l4kf662nc5)]: connection ready; using protocol version 2
INFO: 2019/03/21 16:24:13.911802 overlay_switch ->[9a:f6:f8:cd:f7:37(worker-l4kf662nc5)] using sleeve
INFO: 2019/03/21 16:24:13.911825 ->[10.172.142.25:40161|3e:91:a1:bd:a9:5c(photon-machine)]: connection ready; using protocol version 2
INFO: 2019/03/21 16:24:13.911880 overlay_switch ->[3e:91:a1:bd:a9:5c(photon-machine)] using sleeve
INFO: 2019/03/21 16:24:13.911895 ->[10.172.142.51:37697|9a:f6:f8:cd:f7:37(worker-l4kf662nc5)]: connection added (new peer)
INFO: 2019/03/21 16:24:13.911997 ->[10.172.142.25:40161|3e:91:a1:bd:a9:5c(photon-machine)]: connection added (new peer)
INFO: 2019/03/21 16:24:13.912067 ->[10.172.142.28:35471|22:8e:45:ed:ad:58(worker-xdjwd4zlwf)]: connection ready; using protocol version 2
INFO: 2019/03/21 16:24:13.912145 overlay_switch ->[22:8e:45:ed:ad:58(worker-xdjwd4zlwf)] using sleeve
INFO: 2019/03/21 16:24:13.912198 ->[10.172.142.25:40161|3e:91:a1:bd:a9:5c(photon-machine)]: connection shutting down due to error: read tcp4 10.172.142.52:6783->10.172.142.25:40161: read: connection reset by peer
INFO: 2019/03/21 16:24:13.912348 ->[10.172.142.27:41689|4a:2a:cd:2c:63:03(worker-vrw7qz8pp9)]: connection ready; using protocol version 2
INFO: 2019/03/21 16:24:13.912387 overlay_switch ->[4a:2a:cd:2c:63:03(worker-vrw7qz8pp9)] using sleeve
INFO: 2019/03/21 16:24:13.912531 ->[10.172.142.51:37697|9a:f6:f8:cd:f7:37(worker-l4kf662nc5)]: connection shutting down due to error: write tcp4 10.172.142.52:6783->10.172.142.51:37697: write: connection reset by peer
INFO: 2019/03/21 16:24:13.912603 ->[10.172.142.28:35471|22:8e:45:ed:ad:58(worker-xdjwd4zlwf)]: connection added (new peer)
INFO: 2019/03/21 16:24:13.912843 ->[10.172.142.25:40161|3e:91:a1:bd:a9:5c(photon-machine)]: connection deleted
INFO: 2019/03/21 16:24:13.912858 Removed unreachable peer 3e:91:a1:bd:a9:5c(photon-machine)
INFO: 2019/03/21 16:24:13.912882 ->[10.172.142.27:41689|4a:2a:cd:2c:63:03(worker-vrw7qz8pp9)]: connection added (new peer)

@D3nn @bboreham @palemtnrider Please let me know if you guys need any more information.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions