frequently pod lose ability to connect to pod running on any other k8s nodes until weave-net is bounced #3641
Description
What you expected to happen?
Inter nodes pods communication to be uninterrupted and always available.
What happened?
Multiple time per day we observe a random VM in the cluster start to have all it's pods unable to talk to pods located on any other nodes then itself. Other nodes also lose ability to connect to impacted pods running on that nodes be it through direct communication with pod IP or via a k8s services.
This make for an interesting failure mode since coredns run within a set of pods that is likely on a different node so name resolution ability is lost. Kubernetes services also keep sending traffic to those unreachable pods since liveness/readiness check run locally by the kubelet still return ok.
During those event we observe weave switch from fastdp to sleeve. Output seem to indicate it manage to re-establish all connection within seconds however the inter nodes pods communication is lost in the processes. We tried to drain the node and let it run for multiple hours, it never recover by itself. Deleting the weave-net pod so it be recreated restore the situation.
How to reproduce it?
So fares I am not able to reproduce this situation in other environments. The closest I got was by running iptables command and drop udp traffic in/out for port 6784 udp.
Reaching out for guidance on pinpointing things and understanding weave log and confirm/infirm if this is an expected failure mode.
Anything else we need to know?
This is a medium size cluster of 42 nodes installed with kubeadm running on-premises in vmware. This started occur sporadically 3 weeks ago and as time pass frequency of that event increase. At this point this happen about 5 time per day. Previously this cluster was running without any issue for at least 1 year.
Versions:
$ ./weave --local version
weave 2.5.1
$ docker version
Client:
Version: 17.03.3-ce
API version: 1.27
Go version: go1.7.5
Git commit: e19b718
Built: Thu Aug 30 01:04:10 2018
OS/Arch: linux/amd64
Server:
Version: 17.03.3-ce
API version: 1.27 (minimum version 1.12)
Go version: go1.7.5
Git commit: e19b718
Built: Thu Aug 30 01:04:10 2018
OS/Arch: linux/amd64
Experimental: false
$ uname -a
Linux kube-node-04 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) x86_64 GNU/Linux
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:26:52Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Logs:
Thanks to find below link to gist and collapsable section (triangle is clickable) with all logs and command output I gathered. The event occur at 4:22.
Weave logs :
- impacted kube-node-13 running weave-net-s4vk9
- a peer that saw the action kube-node-15 running weave-net-bp72h
kubectl get nodes -owide (click to expand)
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kube-node-01 Ready master 8d v1.13.5 192.168.62.191 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-02 Ready <none> 8d v1.13.5 192.168.62.192 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-03 Ready <none> 8d v1.13.5 192.168.62.193 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-04 Ready <none> 8d v1.13.5 192.168.62.20 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-05 Ready <none> 8d v1.13.5 192.168.62.21 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-06 Ready <none> 8d v1.13.5 192.168.62.22 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-07 Ready <none> 8d v1.13.5 192.168.62.23 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-08 Ready <none> 8d v1.13.5 192.168.62.24 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-09 Ready <none> 8d v1.13.5 192.168.62.25 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-10 Ready <none> 8d v1.13.5 192.168.62.26 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-11 Ready <none> 8d v1.13.5 192.168.62.27 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-12 Ready <none> 8d v1.13.5 192.168.62.28 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-13 Ready,SchedulingDisabled <none> 8d v1.13.5 192.168.62.29 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-14 Ready <none> 8d v1.13.5 192.168.62.61 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-15 Ready <none> 8d v1.13.5 192.168.62.31 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-16 Ready <none> 8d v1.13.5 192.168.62.32 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-17 Ready <none> 8d v1.13.5 192.168.62.33 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-18 Ready <none> 8d v1.13.5 192.168.62.34 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-19 Ready <none> 8d v1.13.5 192.168.62.35 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-20 Ready <none> 8d v1.13.5 192.168.62.36 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-21 Ready <none> 8d v1.13.5 192.168.62.37 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-22 Ready <none> 8d v1.13.5 192.168.62.38 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-23 Ready <none> 8d v1.13.5 192.168.62.39 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-24 Ready <none> 8d v1.13.5 192.168.62.62 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-25 Ready <none> 8d v1.13.5 192.168.62.41 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-26 Ready <none> 8d v1.13.5 192.168.62.42 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-27 Ready <none> 8d v1.13.5 192.168.62.43 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-28 Ready <none> 8d v1.13.5 192.168.62.44 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-29 Ready <none> 8d v1.13.5 192.168.62.45 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-30 Ready <none> 8d v1.13.5 192.168.62.46 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-31 Ready <none> 8d v1.13.5 192.168.62.47 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-32 Ready <none> 8d v1.13.5 192.168.62.48 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-33 Ready <none> 8d v1.13.5 192.168.62.49 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-34 Ready <none> 8d v1.13.5 192.168.62.63 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-35 Ready <none> 8d v1.13.5 192.168.62.51 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-36 Ready <none> 8d v1.13.5 192.168.62.52 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-37 Ready <none> 8d v1.13.5 192.168.62.53 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-38 Ready <none> 8d v1.13.5 192.168.62.54 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-39 Ready <none> 8d v1.13.5 192.168.62.55 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-40 Ready <none> 8d v1.13.5 192.168.62.56 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-41 Ready <none> 8d v1.13.5 192.168.62.57 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
kube-node-42 Ready <none> 8d v1.13.5 192.168.62.58 <none> Debian GNU/Linux 9 (stretch) 4.9.0-5-amd64 docker://17.3.3
* at the time of the issue kube-node-13 was not SchedulingDisabled, we drained it to allow us time to experiment without impacting services.
kubectl get pod -owide -lname=weave-net -nkube-system (click to expand)
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
weave-net-2mbbc 2/2 Running 2 8d 192.168.62.41 kube-node-25 <none> <none>
weave-net-2wsnw 2/2 Running 1 8d 192.168.62.48 kube-node-32 <none> <none>
weave-net-524wg 2/2 Running 1 8d 192.168.62.36 kube-node-20 <none> <none>
weave-net-6psxw 2/2 Running 2 8d 192.168.62.22 kube-node-06 <none> <none>
weave-net-869q6 2/2 Running 0 39h 192.168.62.37 kube-node-21 <none> <none>
weave-net-995q8 2/2 Running 1 8d 192.168.62.56 kube-node-40 <none> <none>
weave-net-b2gqv 2/2 Running 0 8d 192.168.62.21 kube-node-05 <none> <none>
weave-net-bgk8b 2/2 Running 3 8d 192.168.62.49 kube-node-33 <none> <none>
weave-net-bp72h 2/2 Running 0 8d 192.168.62.31 kube-node-15 <none> <none>
weave-net-dvgfs 2/2 Running 0 8d 192.168.62.42 kube-node-26 <none> <none>
weave-net-f547b 2/2 Running 0 8d 192.168.62.23 kube-node-07 <none> <none>
weave-net-f77hj 2/2 Running 0 8d 192.168.62.193 kube-node-03 <none> <none>
weave-net-f8jr4 2/2 Running 0 8d 192.168.62.191 kube-node-01 <none> <none>
weave-net-fx4r7 2/2 Running 0 3d 192.168.62.51 kube-node-35 <none> <none>
weave-net-fxzjb 2/2 Running 1 8d 192.168.62.55 kube-node-39 <none> <none>
weave-net-g57kk 2/2 Running 0 8d 192.168.62.32 kube-node-16 <none> <none>
weave-net-gfshx 2/2 Running 1 8d 192.168.62.38 kube-node-22 <none> <none>
weave-net-hd66z 2/2 Running 0 8d 192.168.62.26 kube-node-10 <none> <none>
weave-net-jqmm2 2/2 Running 1 8d 192.168.62.192 kube-node-02 <none> <none>
weave-net-kx7t5 2/2 Running 0 39h 192.168.62.52 kube-node-36 <none> <none>
weave-net-l89cw 2/2 Running 0 8d 192.168.62.44 kube-node-28 <none> <none>
weave-net-lsqmn 2/2 Running 1 8d 192.168.62.33 kube-node-17 <none> <none>
weave-net-m2k6b 2/2 Running 1 8d 192.168.62.58 kube-node-42 <none> <none>
weave-net-mh6mn 2/2 Running 1 8d 192.168.62.61 kube-node-14 <none> <none>
weave-net-n5z4d 2/2 Running 0 3d 192.168.62.47 kube-node-31 <none> <none>
weave-net-p94q2 2/2 Running 0 8d 192.168.62.46 kube-node-30 <none> <none>
weave-net-phlnj 2/2 Running 1 8d 192.168.62.53 kube-node-37 <none> <none>
weave-net-r99g5 2/2 Running 0 8d 192.168.62.63 kube-node-34 <none> <none>
weave-net-rpfmh 2/2 Running 0 8d 192.168.62.25 kube-node-09 <none> <none>
weave-net-s4vk9 2/2 Running 2 8d 192.168.62.29 kube-node-13 <none> <none>
weave-net-s6dk5 2/2 Running 2 8d 192.168.62.20 kube-node-04 <none> <none>
weave-net-s8pdp 2/2 Running 1 8d 192.168.62.43 kube-node-27 <none> <none>
weave-net-sbq9v 2/2 Running 2 8d 192.168.62.28 kube-node-12 <none> <none>
weave-net-sstpk 2/2 Running 0 8d 192.168.62.54 kube-node-38 <none> <none>
weave-net-tcwhp 2/2 Running 1 8d 192.168.62.35 kube-node-19 <none> <none>
weave-net-tf6sw 2/2 Running 0 47h 192.168.62.34 kube-node-18 <none> <none>
weave-net-trl8d 2/2 Running 0 8d 192.168.62.24 kube-node-08 <none> <none>
weave-net-wjgws 2/2 Running 1 8d 192.168.62.39 kube-node-23 <none> <none>
weave-net-wnmvp 2/2 Running 2 8d 192.168.62.27 kube-node-11 <none> <none>
weave-net-xqrjm 2/2 Running 2 8d 192.168.62.45 kube-node-29 <none> <none>
weave-net-zn66g 2/2 Running 1 8d 192.168.62.62 kube-node-24 <none> <none>
weave-net-zzns8 2/2 Running 1 8d 192.168.62.57 kube-node-41 <none> <none>
kubectl get configmap weave-net -oyaml (click to expand)
apiVersion: v1
kind: ConfigMap
metadata:
annotations:
kube-peers.weave.works/peers: '{"Peers":[{"PeerName":"ce:d4:17:3b:00:db","NodeName":"kube-node-01"},{"PeerName":"1e:e4:18:98:ce:c4","NodeName":"kube-node-31"},{"PeerName":"da:ef:6b:19:cf:fe","NodeName":"kube-node-34"},{"PeerName":"de:30:16:77:64:d8","NodeName":"kube-node-30"},{"PeerName":"7e:ed:9d:df:ce:a6","NodeName":"kube-node-26"},{"PeerName":"ee:99:1b:17:84:cb","NodeName":"kube-node-42"},{"PeerName":"46:e6:49:7c:6b:88","NodeName":"kube-node-32"},{"PeerName":"96:13:be:0c:ac:e5","NodeName":"kube-node-33"},{"PeerName":"b2:24:32:da:61:d3","NodeName":"kube-node-40"},{"PeerName":"d6:38:1b:fb:97:a4","NodeName":"kube-node-08"},{"PeerName":"1a:a4:36:e2:2c:44","NodeName":"kube-node-11"},{"PeerName":"82:14:63:36:a5:de","NodeName":"kube-node-03"},{"PeerName":"8a:45:32:0a:c0:46","NodeName":"kube-node-02"},{"PeerName":"a2:ea:5b:4a:72:1d","NodeName":"kube-node-10"},{"PeerName":"be:f7:87:9e:3c:40","NodeName":"kube-node-07"},{"PeerName":"62:b7:f9:fe:10:ab","NodeName":"kube-node-41"},{"PeerName":"ba:24:c0:0b:ba:bd","NodeName":"kube-node-18"},{"PeerName":"22:41:be:0d:19:ea","NodeName":"kube-node-15"},{"PeerName":"7e:88:bb:7c:b9:cc","NodeName":"kube-node-16"},{"PeerName":"6a:1a:a0:47:c1:2a","NodeName":"kube-node-38"},{"PeerName":"5e:3a:1e:47:8c:3d","NodeName":"kube-node-29"},{"PeerName":"ea:11:40:96:af:24","NodeName":"kube-node-28"},{"PeerName":"ca:21:fb:33:fa:26","NodeName":"kube-node-19"},{"PeerName":"1e:69:6c:54:6d:af","NodeName":"kube-node-05"},{"PeerName":"4e:bd:fe:81:d4:73","NodeName":"kube-node-09"},{"PeerName":"fe:bd:2d:6d:f4:2d","NodeName":"kube-node-17"},{"PeerName":"86:96:14:d4:07:9b","NodeName":"kube-node-35"},{"PeerName":"fe:1f:d3:5f:49:fd","NodeName":"kube-node-25"},{"PeerName":"56:e4:c4:a0:8b:66","NodeName":"kube-node-36"},{"PeerName":"06:07:7f:a1:ae:96","NodeName":"kube-node-24"},{"PeerName":"8e:f2:46:a8:e4:3a","NodeName":"kube-node-21"},{"PeerName":"0e:ff:07:bc:7f:1f","NodeName":"kube-node-22"},{"PeerName":"2e:4e:bd:b4:a1:11","NodeName":"kube-node-27"},{"PeerName":"d6:e9:e9:56:3c:4c","NodeName":"kube-node-04"},{"PeerName":"be:91:56:9d:c7:06","NodeName":"kube-node-39"},{"PeerName":"ee:fc:09:be:00:df","NodeName":"kube-node-13"},{"PeerName":"ca:84:00:e6:57:a9","NodeName":"kube-node-12"},{"PeerName":"2a:8c:99:01:75:e3","NodeName":"kube-node-06"},{"PeerName":"66:4d:43:df:97:77","NodeName":"kube-node-20"},{"PeerName":"42:33:fc:1e:9a:fc","NodeName":"kube-node-23"},{"PeerName":"ca:9f:2d:88:c9:fa","NodeName":"kube-node-14"},{"PeerName":"92:c7:36:a5:eb:75","NodeName":"kube-node-37"}]}'
creationTimestamp: "2019-04-28T11:27:00Z"
name: weave-net
namespace: kube-system
resourceVersion: "6072"
selfLink: /api/v1/namespaces/kube-system/configmaps/weave-net
uid: 89f35158-69a8-11e9-a8c4-0050568733bc
I ran the following weave command on both weave-net pod above
./weave --local status
./weave --local status connections
./weave --local status ipam
./weave --local report
./weave --local status peers
I bundled the weave command output just above per pod :
netstat -i on kube-node-13 (click to expand)
Kernel Interface table
Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
datapath 1376 20597484 0 0 0 557 0 0 0 BMRU
docker0 1500 0 0 0 0 0 0 0 0 BMU
eth0 1500 399613179 0 4455 0 225991905 0 0 0 BMRU
lo 65536 709418 0 0 0 709418 0 0 0 LRU
vethwe-b 1376 28421958 0 0 0 8790100 0 0 0 BMRU
vethwe-d 1376 8790100 0 0 0 28421958 0 0 0 BMRU
vethwepl 1376 1706061 0 0 0 21987839 0 0 0 BMRU
vethwepl 1376 69516 0 0 0 20667372 0 0 0 BMRU
vethwepl 1376 245776 0 0 0 20809465 0 0 0 BMRU
vethwepl 1376 2598116 0 0 0 23287412 0 0 0 BMRU
vxlan-67 65485 36022722 0 0 0 27108788 0 8 0 BMRU
weave 1376 201067249 0 0 0 170367479 0 0 0 BMRU
# one hour later
Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
datapath 1376 20615012 0 0 0 558 0 0 0 BMRU
docker0 1500 0 0 0 0 0 0 0 0 BMU
eth0 1500 400134345 0 4633 0 226116220 0 0 0 BMRU
lo 65536 713448 0 0 0 713448 0 0 0 LRU
vethwe-b 1376 28439487 0 0 0 8790235 0 0 0 BMRU
vethwe-d 1376 8790235 0 0 0 28439487 0 0 0 BMRU
vethwepl 1376 1709103 0 0 0 22007775 0 0 0 BMRU
vethwepl 1376 69915 0 0 0 20685302 0 0 0 BMRU
vethwepl 1376 246042 0 0 0 20827130 0 0 0 BMRU
vethwepl 1376 2608023 0 0 0 23316447 0 0 0 BMRU
vxlan-67 65485 36349024 0 0 0 27125107 0 8 0 BMRU
weave 1376 201098252 0 0 0 170381921 0 0 0 BMRU
netstat -i on kube-node-15 (click to expand)
Kernel Interface table
Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
datapath 1376 20620436 0 0 0 457 0 0 0 BMRU
docker0 1500 0 0 0 0 0 0 0 0 BMU
eth0 1500 400219170 0 4564 0 262336135 0 0 0 BMRU
lo 65536 711005 0 0 0 711005 0 0 0 LRU
vethwe-b 1376 42452135 0 0 0 22681381 0 0 0 BMRU
vethwe-d 1376 22681381 0 0 0 42452135 0 0 0 BMRU
vethwepl 1376 2623593 0 0 0 5498571 0 0 0 BMRU
vethwepl 1376 69939 0 0 0 20690681 0 0 0 BMRU
vethwepl 1376 4577435 0 0 0 25121590 0 0 0 BMRU
vethwepl 1376 203458 0 0 0 2248214 0 0 0 BMRU
vethwepl 1376 10789639 0 0 0 23471612 0 0 0 BMRU
vethwepl 1376 256315 0 0 0 20843557 0 0 0 BMRU
vethwepl 1376 519734 0 0 0 13256327 0 0 0 BMRU
vethwepl 1376 8311771 0 0 0 16242806 0 0 0 BMRU
vethwepl 1376 972504 0 0 0 14050125 0 0 0 BMRU
vethwepl 1376 26466230 0 0 0 37959971 0 0 0 BMRU
vethwepl 1376 984011 0 0 0 14125608 0 0 0 BMRU
vethwepl 1376 4309263 0 0 0 18503301 0 0 0 BMRU
vethwepl 1376 2010851 0 0 0 22292166 0 0 0 BMRU
vxlan-67 65485 51759435 0 0 0 55382541 0 140 0 BMRU
weave 1376 206375630 0 0 0 178486203 0 0 0 BMRU
dmesg output on kube-node-13
[Mon May 6 00:30:04 2019] weave: port 14(vethwepl8acded0) entered disabled state
[Mon May 6 00:30:04 2019] device vethwepl8acded0 left promiscuous mode
[Mon May 6 00:30:04 2019] weave: port 14(vethwepl8acded0) entered disabled state
[Mon May 6 00:30:04 2019] weave: port 19(vethwepl08d2b66) entered disabled state
[Mon May 6 00:30:04 2019] device vethwepl08d2b66 left promiscuous mode
[Mon May 6 00:30:04 2019] weave: port 19(vethwepl08d2b66) entered disabled state
[Mon May 6 00:30:05 2019] weave: port 10(vethweplb1da716) entered disabled state
[Mon May 6 00:30:05 2019] device vethweplb1da716 left promiscuous mode
[Mon May 6 00:30:05 2019] weave: port 10(vethweplb1da716) entered disabled state
[Mon May 6 00:30:05 2019] weave: port 15(vethwepl441ee0c) entered disabled state
[Mon May 6 00:30:05 2019] device vethwepl441ee0c left promiscuous mode
[Mon May 6 00:30:05 2019] weave: port 15(vethwepl441ee0c) entered disabled state
[Mon May 6 00:30:05 2019] weave: port 11(vethwepl25d7e34) entered disabled state
[Mon May 6 00:30:05 2019] device vethwepl25d7e34 left promiscuous mode
[Mon May 6 00:30:05 2019] weave: port 11(vethwepl25d7e34) entered disabled state
[Mon May 6 00:30:06 2019] weave: port 9(vethweplfbe9743) entered disabled state
[Mon May 6 00:30:06 2019] device vethweplfbe9743 left promiscuous mode
[Mon May 6 00:30:06 2019] weave: port 9(vethweplfbe9743) entered disabled state
[Mon May 6 00:30:07 2019] weave: port 21(vethwepl4fc1c8e) entered disabled state
[Mon May 6 00:30:07 2019] device vethwepl4fc1c8e left promiscuous mode
[Mon May 6 00:30:07 2019] weave: port 21(vethwepl4fc1c8e) entered disabled state
[Mon May 6 00:30:07 2019] weave: port 17(vethweplc31ce66) entered disabled state
[Mon May 6 00:30:07 2019] device vethweplc31ce66 left promiscuous mode
[Mon May 6 00:30:07 2019] weave: port 17(vethweplc31ce66) entered disabled state
[Mon May 6 00:30:08 2019] weave: port 7(vethwepldd82b30) entered disabled state
[Mon May 6 00:30:08 2019] device vethwepldd82b30 left promiscuous mode
[Mon May 6 00:30:08 2019] weave: port 7(vethwepldd82b30) entered disabled state
[Mon May 6 00:30:08 2019] weave: port 22(vethwepl0870155) entered disabled state
[Mon May 6 00:30:08 2019] device vethwepl0870155 left promiscuous mode
[Mon May 6 00:30:08 2019] weave: port 22(vethwepl0870155) entered disabled state
[Mon May 6 00:30:08 2019] weave: port 12(vethwepl440f01d) entered disabled state
[Mon May 6 00:30:08 2019] device vethwepl440f01d left promiscuous mode
[Mon May 6 00:30:08 2019] weave: port 12(vethwepl440f01d) entered disabled state
[Mon May 6 00:30:09 2019] weave: port 13(vethwepl21ad0c7) entered disabled state
[Mon May 6 00:30:09 2019] device vethwepl21ad0c7 left promiscuous mode
[Mon May 6 00:30:09 2019] weave: port 13(vethwepl21ad0c7) entered disabled state
[Mon May 6 00:30:26 2019] weave: port 8(vethwepl25e42e6) entered disabled state
[Mon May 6 00:30:26 2019] device vethwepl25e42e6 left promiscuous mode
[Mon May 6 00:30:26 2019] weave: port 8(vethwepl25e42e6) entered disabled state
[Mon May 6 00:31:00 2019] weave: port 6(vethwepl47ca67b) entered disabled state
[Mon May 6 00:31:00 2019] device vethwepl47ca67b left promiscuous mode
[Mon May 6 00:31:00 2019] weave: port 6(vethwepl47ca67b) entered disabled state
dmesg output on kube-node-15 (click to expand)
none for the day of the issue or the 2 day before
iptables rules and ifconfig output for both nodes
Weave-net had been bounced to restore services on those pod/nodes but for completeness the output of a few more commands :
on kube-node-13 (click to expand)
$ ip route sh
default via 192.168.62.1 dev eth0
10.32.0.0/12 dev weave proto kernel scope link src 10.45.192.0
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.62.0/24 dev eth0 proto kernel scope link src 192.168.62.29
$ ip -4 -o addr
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
2: eth0 inet 192.168.62.29/24 brd 192.168.62.255 scope global eth0\ valid_lft forever preferred_lft forever
3: docker0 inet 172.17.0.1/16 scope global docker0\ valid_lft forever preferred_lft forever
6: weave inet 10.45.192.0/12 brd 10.47.255.255 scope global weave\ valid_lft forever preferred_lft forever
on kube-node-15(click to expand)
$ ip route sh
default via 192.168.62.1 dev eth0
10.32.0.0/12 dev weave proto kernel scope link src 10.47.192.0
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.62.0/24 dev eth0 proto kernel scope link src 192.168.62.31
$ ip -4 -o addr
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
2: eth0 inet 192.168.62.31/24 brd 192.168.62.255 scope global eth0\ valid_lft forever preferred_lft forever
3: docker0 inet 172.17.0.1/16 scope global docker0\ valid_lft forever preferred_lft forever
6: weave inet 10.47.192.0/12 brd 10.47.255.255 scope global weave\ valid_lft forever preferred_lft forever
I Looked at the output of those command, 8 minutes after the beginning of the issue I see kubelet attempting to teardown some volume and container running on the nodes and matching activities in docker.services. Seem like "normal" noise.
$ journalctl -u docker.service --no-pager
$ journalctl -u kubelet --no-pager
Thing that we tried
- Deleting /var/lib/weave/* on all nodes and changing the weave-net daemonset to trigger a rollout. Situation persist.
Activity