Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

frequently pod lose ability to connect to pod running on any other k8s nodes until weave-net is bounced #3641

Closed
@pfcarrier

Description

What you expected to happen?

Inter nodes pods communication to be uninterrupted and always available.

What happened?

Multiple time per day we observe a random VM in the cluster start to have all it's pods unable to talk to pods located on any other nodes then itself. Other nodes also lose ability to connect to impacted pods running on that nodes be it through direct communication with pod IP or via a k8s services.

This make for an interesting failure mode since coredns run within a set of pods that is likely on a different node so name resolution ability is lost. Kubernetes services also keep sending traffic to those unreachable pods since liveness/readiness check run locally by the kubelet still return ok.

During those event we observe weave switch from fastdp to sleeve. Output seem to indicate it manage to re-establish all connection within seconds however the inter nodes pods communication is lost in the processes. We tried to drain the node and let it run for multiple hours, it never recover by itself. Deleting the weave-net pod so it be recreated restore the situation.

How to reproduce it?

So fares I am not able to reproduce this situation in other environments. The closest I got was by running iptables command and drop udp traffic in/out for port 6784 udp.

Reaching out for guidance on pinpointing things and understanding weave log and confirm/infirm if this is an expected failure mode.

Anything else we need to know?

This is a medium size cluster of 42 nodes installed with kubeadm running on-premises in vmware. This started occur sporadically 3 weeks ago and as time pass frequency of that event increase. At this point this happen about 5 time per day. Previously this cluster was running without any issue for at least 1 year.

Versions:

$ ./weave --local version
weave 2.5.1

$ docker version
Client:
 Version:      17.03.3-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   e19b718
 Built:        Thu Aug 30 01:04:10 2018
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.3-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   e19b718
 Built:        Thu Aug 30 01:04:10 2018
 OS/Arch:      linux/amd64
 Experimental: false

$ uname -a
Linux kube-node-04 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) x86_64 GNU/Linux

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:26:52Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Logs:

Thanks to find below link to gist and collapsable section (triangle is clickable) with all logs and command output I gathered. The event occur at 4:22.

Weave logs :

kubectl get nodes -owide (click to expand)
NAME                   STATUS                     ROLES    AGE   VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE                       KERNEL-VERSION   CONTAINER-RUNTIME
kube-node-01   Ready                      master   8d    v1.13.5   192.168.62.191   <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-02   Ready                      <none>   8d    v1.13.5   192.168.62.192   <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-03   Ready                      <none>   8d    v1.13.5   192.168.62.193   <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-04   Ready                      <none>   8d    v1.13.5   192.168.62.20    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-05   Ready                      <none>   8d    v1.13.5   192.168.62.21    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-06   Ready                      <none>   8d    v1.13.5   192.168.62.22    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-07   Ready                      <none>   8d    v1.13.5   192.168.62.23    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-08   Ready                      <none>   8d    v1.13.5   192.168.62.24    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-09   Ready                      <none>   8d    v1.13.5   192.168.62.25    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-10   Ready                      <none>   8d    v1.13.5   192.168.62.26    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-11   Ready                      <none>   8d    v1.13.5   192.168.62.27    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-12   Ready                      <none>   8d    v1.13.5   192.168.62.28    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-13   Ready,SchedulingDisabled   <none>   8d    v1.13.5   192.168.62.29    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-14   Ready                      <none>   8d    v1.13.5   192.168.62.61    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-15   Ready                      <none>   8d    v1.13.5   192.168.62.31    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-16   Ready                      <none>   8d    v1.13.5   192.168.62.32    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-17   Ready                      <none>   8d    v1.13.5   192.168.62.33    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-18   Ready                      <none>   8d    v1.13.5   192.168.62.34    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-19   Ready                      <none>   8d    v1.13.5   192.168.62.35    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-20   Ready                      <none>   8d    v1.13.5   192.168.62.36    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-21   Ready                      <none>   8d    v1.13.5   192.168.62.37    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-22   Ready                      <none>   8d    v1.13.5   192.168.62.38    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-23   Ready                      <none>   8d    v1.13.5   192.168.62.39    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-24   Ready                      <none>   8d    v1.13.5   192.168.62.62    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-25   Ready                      <none>   8d    v1.13.5   192.168.62.41    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-26   Ready                      <none>   8d    v1.13.5   192.168.62.42    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-27   Ready                      <none>   8d    v1.13.5   192.168.62.43    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-28   Ready                      <none>   8d    v1.13.5   192.168.62.44    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-29   Ready                      <none>   8d    v1.13.5   192.168.62.45    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-30   Ready                      <none>   8d    v1.13.5   192.168.62.46    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-31   Ready                      <none>   8d    v1.13.5   192.168.62.47    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-32   Ready                      <none>   8d    v1.13.5   192.168.62.48    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-33   Ready                      <none>   8d    v1.13.5   192.168.62.49    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-34   Ready                      <none>   8d    v1.13.5   192.168.62.63    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-35   Ready                      <none>   8d    v1.13.5   192.168.62.51    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-36   Ready                      <none>   8d    v1.13.5   192.168.62.52    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-37   Ready                      <none>   8d    v1.13.5   192.168.62.53    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-38   Ready                      <none>   8d    v1.13.5   192.168.62.54    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-39   Ready                      <none>   8d    v1.13.5   192.168.62.55    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-40   Ready                      <none>   8d    v1.13.5   192.168.62.56    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-41   Ready                      <none>   8d    v1.13.5   192.168.62.57    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3
kube-node-42   Ready                      <none>   8d    v1.13.5   192.168.62.58    <none>        Debian GNU/Linux 9 (stretch)   4.9.0-5-amd64    docker://17.3.3

* at the time of the issue kube-node-13 was not SchedulingDisabled, we drained it to allow us time to experiment without impacting services.
kubectl get pod -owide -lname=weave-net -nkube-system (click to expand)
NAME              READY   STATUS    RESTARTS   AGE   IP               NODE                   NOMINATED NODE   READINESS GATES
weave-net-2mbbc   2/2     Running   2          8d    192.168.62.41    kube-node-25   <none>           <none>
weave-net-2wsnw   2/2     Running   1          8d    192.168.62.48    kube-node-32   <none>           <none>
weave-net-524wg   2/2     Running   1          8d    192.168.62.36    kube-node-20   <none>           <none>
weave-net-6psxw   2/2     Running   2          8d    192.168.62.22    kube-node-06   <none>           <none>
weave-net-869q6   2/2     Running   0          39h   192.168.62.37    kube-node-21   <none>           <none>
weave-net-995q8   2/2     Running   1          8d    192.168.62.56    kube-node-40   <none>           <none>
weave-net-b2gqv   2/2     Running   0          8d    192.168.62.21    kube-node-05   <none>           <none>
weave-net-bgk8b   2/2     Running   3          8d    192.168.62.49    kube-node-33   <none>           <none>
weave-net-bp72h   2/2     Running   0          8d    192.168.62.31    kube-node-15   <none>           <none>
weave-net-dvgfs   2/2     Running   0          8d    192.168.62.42    kube-node-26   <none>           <none>
weave-net-f547b   2/2     Running   0          8d    192.168.62.23    kube-node-07   <none>           <none>
weave-net-f77hj   2/2     Running   0          8d    192.168.62.193   kube-node-03   <none>           <none>
weave-net-f8jr4   2/2     Running   0          8d    192.168.62.191   kube-node-01   <none>           <none>
weave-net-fx4r7   2/2     Running   0          3d    192.168.62.51    kube-node-35   <none>           <none>
weave-net-fxzjb   2/2     Running   1          8d    192.168.62.55    kube-node-39   <none>           <none>
weave-net-g57kk   2/2     Running   0          8d    192.168.62.32    kube-node-16   <none>           <none>
weave-net-gfshx   2/2     Running   1          8d    192.168.62.38    kube-node-22   <none>           <none>
weave-net-hd66z   2/2     Running   0          8d    192.168.62.26    kube-node-10   <none>           <none>
weave-net-jqmm2   2/2     Running   1          8d    192.168.62.192   kube-node-02   <none>           <none>
weave-net-kx7t5   2/2     Running   0          39h   192.168.62.52    kube-node-36   <none>           <none>
weave-net-l89cw   2/2     Running   0          8d    192.168.62.44    kube-node-28   <none>           <none>
weave-net-lsqmn   2/2     Running   1          8d    192.168.62.33    kube-node-17   <none>           <none>
weave-net-m2k6b   2/2     Running   1          8d    192.168.62.58    kube-node-42   <none>           <none>
weave-net-mh6mn   2/2     Running   1          8d    192.168.62.61    kube-node-14   <none>           <none>
weave-net-n5z4d   2/2     Running   0          3d    192.168.62.47    kube-node-31   <none>           <none>
weave-net-p94q2   2/2     Running   0          8d    192.168.62.46    kube-node-30   <none>           <none>
weave-net-phlnj   2/2     Running   1          8d    192.168.62.53    kube-node-37   <none>           <none>
weave-net-r99g5   2/2     Running   0          8d    192.168.62.63    kube-node-34   <none>           <none>
weave-net-rpfmh   2/2     Running   0          8d    192.168.62.25    kube-node-09   <none>           <none>
weave-net-s4vk9   2/2     Running   2          8d    192.168.62.29    kube-node-13   <none>           <none>
weave-net-s6dk5   2/2     Running   2          8d    192.168.62.20    kube-node-04   <none>           <none>
weave-net-s8pdp   2/2     Running   1          8d    192.168.62.43    kube-node-27   <none>           <none>
weave-net-sbq9v   2/2     Running   2          8d    192.168.62.28    kube-node-12   <none>           <none>
weave-net-sstpk   2/2     Running   0          8d    192.168.62.54    kube-node-38   <none>           <none>
weave-net-tcwhp   2/2     Running   1          8d    192.168.62.35    kube-node-19   <none>           <none>
weave-net-tf6sw   2/2     Running   0          47h   192.168.62.34    kube-node-18   <none>           <none>
weave-net-trl8d   2/2     Running   0          8d    192.168.62.24    kube-node-08   <none>           <none>
weave-net-wjgws   2/2     Running   1          8d    192.168.62.39    kube-node-23   <none>           <none>
weave-net-wnmvp   2/2     Running   2          8d    192.168.62.27    kube-node-11   <none>           <none>
weave-net-xqrjm   2/2     Running   2          8d    192.168.62.45    kube-node-29   <none>           <none>
weave-net-zn66g   2/2     Running   1          8d    192.168.62.62    kube-node-24   <none>           <none>
weave-net-zzns8   2/2     Running   1          8d    192.168.62.57    kube-node-41   <none>           <none>
kubectl get configmap weave-net -oyaml (click to expand)
apiVersion: v1
kind: ConfigMap
metadata:
annotations:
  kube-peers.weave.works/peers: '{"Peers":[{"PeerName":"ce:d4:17:3b:00:db","NodeName":"kube-node-01"},{"PeerName":"1e:e4:18:98:ce:c4","NodeName":"kube-node-31"},{"PeerName":"da:ef:6b:19:cf:fe","NodeName":"kube-node-34"},{"PeerName":"de:30:16:77:64:d8","NodeName":"kube-node-30"},{"PeerName":"7e:ed:9d:df:ce:a6","NodeName":"kube-node-26"},{"PeerName":"ee:99:1b:17:84:cb","NodeName":"kube-node-42"},{"PeerName":"46:e6:49:7c:6b:88","NodeName":"kube-node-32"},{"PeerName":"96:13:be:0c:ac:e5","NodeName":"kube-node-33"},{"PeerName":"b2:24:32:da:61:d3","NodeName":"kube-node-40"},{"PeerName":"d6:38:1b:fb:97:a4","NodeName":"kube-node-08"},{"PeerName":"1a:a4:36:e2:2c:44","NodeName":"kube-node-11"},{"PeerName":"82:14:63:36:a5:de","NodeName":"kube-node-03"},{"PeerName":"8a:45:32:0a:c0:46","NodeName":"kube-node-02"},{"PeerName":"a2:ea:5b:4a:72:1d","NodeName":"kube-node-10"},{"PeerName":"be:f7:87:9e:3c:40","NodeName":"kube-node-07"},{"PeerName":"62:b7:f9:fe:10:ab","NodeName":"kube-node-41"},{"PeerName":"ba:24:c0:0b:ba:bd","NodeName":"kube-node-18"},{"PeerName":"22:41:be:0d:19:ea","NodeName":"kube-node-15"},{"PeerName":"7e:88:bb:7c:b9:cc","NodeName":"kube-node-16"},{"PeerName":"6a:1a:a0:47:c1:2a","NodeName":"kube-node-38"},{"PeerName":"5e:3a:1e:47:8c:3d","NodeName":"kube-node-29"},{"PeerName":"ea:11:40:96:af:24","NodeName":"kube-node-28"},{"PeerName":"ca:21:fb:33:fa:26","NodeName":"kube-node-19"},{"PeerName":"1e:69:6c:54:6d:af","NodeName":"kube-node-05"},{"PeerName":"4e:bd:fe:81:d4:73","NodeName":"kube-node-09"},{"PeerName":"fe:bd:2d:6d:f4:2d","NodeName":"kube-node-17"},{"PeerName":"86:96:14:d4:07:9b","NodeName":"kube-node-35"},{"PeerName":"fe:1f:d3:5f:49:fd","NodeName":"kube-node-25"},{"PeerName":"56:e4:c4:a0:8b:66","NodeName":"kube-node-36"},{"PeerName":"06:07:7f:a1:ae:96","NodeName":"kube-node-24"},{"PeerName":"8e:f2:46:a8:e4:3a","NodeName":"kube-node-21"},{"PeerName":"0e:ff:07:bc:7f:1f","NodeName":"kube-node-22"},{"PeerName":"2e:4e:bd:b4:a1:11","NodeName":"kube-node-27"},{"PeerName":"d6:e9:e9:56:3c:4c","NodeName":"kube-node-04"},{"PeerName":"be:91:56:9d:c7:06","NodeName":"kube-node-39"},{"PeerName":"ee:fc:09:be:00:df","NodeName":"kube-node-13"},{"PeerName":"ca:84:00:e6:57:a9","NodeName":"kube-node-12"},{"PeerName":"2a:8c:99:01:75:e3","NodeName":"kube-node-06"},{"PeerName":"66:4d:43:df:97:77","NodeName":"kube-node-20"},{"PeerName":"42:33:fc:1e:9a:fc","NodeName":"kube-node-23"},{"PeerName":"ca:9f:2d:88:c9:fa","NodeName":"kube-node-14"},{"PeerName":"92:c7:36:a5:eb:75","NodeName":"kube-node-37"}]}'
creationTimestamp: "2019-04-28T11:27:00Z"
name: weave-net
namespace: kube-system
resourceVersion: "6072"
selfLink: /api/v1/namespaces/kube-system/configmaps/weave-net
uid: 89f35158-69a8-11e9-a8c4-0050568733bc

I ran the following weave command on both weave-net pod above

  • ./weave --local status
  • ./weave --local status connections
  • ./weave --local status ipam
  • ./weave --local report
  • ./weave --local status peers

I bundled the weave command output just above per pod :

netstat -i on kube-node-13 (click to expand)
Kernel Interface table
Iface      MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
datapath  1376 20597484      0      0 0           557      0      0      0 BMRU
docker0   1500        0      0      0 0             0      0      0      0 BMU
eth0      1500 399613179      0   4455 0      225991905      0      0      0 BMRU
lo       65536   709418      0      0 0        709418      0      0      0 LRU
vethwe-b  1376 28421958      0      0 0       8790100      0      0      0 BMRU
vethwe-d  1376  8790100      0      0 0      28421958      0      0      0 BMRU
vethwepl  1376  1706061      0      0 0      21987839      0      0      0 BMRU
vethwepl  1376    69516      0      0 0      20667372      0      0      0 BMRU
vethwepl  1376   245776      0      0 0      20809465      0      0      0 BMRU
vethwepl  1376  2598116      0      0 0      23287412      0      0      0 BMRU
vxlan-67 65485 36022722      0      0 0      27108788      0      8      0 BMRU
weave     1376 201067249      0      0 0      170367479      0      0      0 BMRU

# one hour later

Iface      MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
datapath  1376 20615012      0      0 0           558      0      0      0 BMRU
docker0   1500        0      0      0 0             0      0      0      0 BMU
eth0      1500 400134345      0   4633 0      226116220      0      0      0 BMRU
lo       65536   713448      0      0 0        713448      0      0      0 LRU
vethwe-b  1376 28439487      0      0 0       8790235      0      0      0 BMRU
vethwe-d  1376  8790235      0      0 0      28439487      0      0      0 BMRU
vethwepl  1376  1709103      0      0 0      22007775      0      0      0 BMRU
vethwepl  1376    69915      0      0 0      20685302      0      0      0 BMRU
vethwepl  1376   246042      0      0 0      20827130      0      0      0 BMRU
vethwepl  1376  2608023      0      0 0      23316447      0      0      0 BMRU
vxlan-67 65485 36349024      0      0 0      27125107      0      8      0 BMRU
weave     1376 201098252      0      0 0      170381921      0      0      0 BMRU
netstat -i on kube-node-15 (click to expand)
Kernel Interface table
Iface      MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
datapath  1376 20620436      0      0 0           457      0      0      0 BMRU
docker0   1500        0      0      0 0             0      0      0      0 BMU
eth0      1500 400219170      0   4564 0      262336135      0      0      0 BMRU
lo       65536   711005      0      0 0        711005      0      0      0 LRU
vethwe-b  1376 42452135      0      0 0      22681381      0      0      0 BMRU
vethwe-d  1376 22681381      0      0 0      42452135      0      0      0 BMRU
vethwepl  1376  2623593      0      0 0       5498571      0      0      0 BMRU
vethwepl  1376    69939      0      0 0      20690681      0      0      0 BMRU
vethwepl  1376  4577435      0      0 0      25121590      0      0      0 BMRU
vethwepl  1376   203458      0      0 0       2248214      0      0      0 BMRU
vethwepl  1376 10789639      0      0 0      23471612      0      0      0 BMRU
vethwepl  1376   256315      0      0 0      20843557      0      0      0 BMRU
vethwepl  1376   519734      0      0 0      13256327      0      0      0 BMRU
vethwepl  1376  8311771      0      0 0      16242806      0      0      0 BMRU
vethwepl  1376   972504      0      0 0      14050125      0      0      0 BMRU
vethwepl  1376 26466230      0      0 0      37959971      0      0      0 BMRU
vethwepl  1376   984011      0      0 0      14125608      0      0      0 BMRU
vethwepl  1376  4309263      0      0 0      18503301      0      0      0 BMRU
vethwepl  1376  2010851      0      0 0      22292166      0      0      0 BMRU
vxlan-67 65485 51759435      0      0 0      55382541      0    140      0 BMRU
weave     1376 206375630      0      0 0      178486203      0      0      0 BMRU
dmesg output on kube-node-13
[Mon May  6 00:30:04 2019] weave: port 14(vethwepl8acded0) entered disabled state
[Mon May  6 00:30:04 2019] device vethwepl8acded0 left promiscuous mode
[Mon May  6 00:30:04 2019] weave: port 14(vethwepl8acded0) entered disabled state
[Mon May  6 00:30:04 2019] weave: port 19(vethwepl08d2b66) entered disabled state
[Mon May  6 00:30:04 2019] device vethwepl08d2b66 left promiscuous mode
[Mon May  6 00:30:04 2019] weave: port 19(vethwepl08d2b66) entered disabled state
[Mon May  6 00:30:05 2019] weave: port 10(vethweplb1da716) entered disabled state
[Mon May  6 00:30:05 2019] device vethweplb1da716 left promiscuous mode
[Mon May  6 00:30:05 2019] weave: port 10(vethweplb1da716) entered disabled state
[Mon May  6 00:30:05 2019] weave: port 15(vethwepl441ee0c) entered disabled state
[Mon May  6 00:30:05 2019] device vethwepl441ee0c left promiscuous mode
[Mon May  6 00:30:05 2019] weave: port 15(vethwepl441ee0c) entered disabled state
[Mon May  6 00:30:05 2019] weave: port 11(vethwepl25d7e34) entered disabled state
[Mon May  6 00:30:05 2019] device vethwepl25d7e34 left promiscuous mode
[Mon May  6 00:30:05 2019] weave: port 11(vethwepl25d7e34) entered disabled state
[Mon May  6 00:30:06 2019] weave: port 9(vethweplfbe9743) entered disabled state
[Mon May  6 00:30:06 2019] device vethweplfbe9743 left promiscuous mode
[Mon May  6 00:30:06 2019] weave: port 9(vethweplfbe9743) entered disabled state
[Mon May  6 00:30:07 2019] weave: port 21(vethwepl4fc1c8e) entered disabled state
[Mon May  6 00:30:07 2019] device vethwepl4fc1c8e left promiscuous mode
[Mon May  6 00:30:07 2019] weave: port 21(vethwepl4fc1c8e) entered disabled state
[Mon May  6 00:30:07 2019] weave: port 17(vethweplc31ce66) entered disabled state
[Mon May  6 00:30:07 2019] device vethweplc31ce66 left promiscuous mode
[Mon May  6 00:30:07 2019] weave: port 17(vethweplc31ce66) entered disabled state
[Mon May  6 00:30:08 2019] weave: port 7(vethwepldd82b30) entered disabled state
[Mon May  6 00:30:08 2019] device vethwepldd82b30 left promiscuous mode
[Mon May  6 00:30:08 2019] weave: port 7(vethwepldd82b30) entered disabled state
[Mon May  6 00:30:08 2019] weave: port 22(vethwepl0870155) entered disabled state
[Mon May  6 00:30:08 2019] device vethwepl0870155 left promiscuous mode
[Mon May  6 00:30:08 2019] weave: port 22(vethwepl0870155) entered disabled state
[Mon May  6 00:30:08 2019] weave: port 12(vethwepl440f01d) entered disabled state
[Mon May  6 00:30:08 2019] device vethwepl440f01d left promiscuous mode
[Mon May  6 00:30:08 2019] weave: port 12(vethwepl440f01d) entered disabled state
[Mon May  6 00:30:09 2019] weave: port 13(vethwepl21ad0c7) entered disabled state
[Mon May  6 00:30:09 2019] device vethwepl21ad0c7 left promiscuous mode
[Mon May  6 00:30:09 2019] weave: port 13(vethwepl21ad0c7) entered disabled state
[Mon May  6 00:30:26 2019] weave: port 8(vethwepl25e42e6) entered disabled state
[Mon May  6 00:30:26 2019] device vethwepl25e42e6 left promiscuous mode
[Mon May  6 00:30:26 2019] weave: port 8(vethwepl25e42e6) entered disabled state
[Mon May  6 00:31:00 2019] weave: port 6(vethwepl47ca67b) entered disabled state
[Mon May  6 00:31:00 2019] device vethwepl47ca67b left promiscuous mode
[Mon May  6 00:31:00 2019] weave: port 6(vethwepl47ca67b) entered disabled state
dmesg output on kube-node-15 (click to expand)
none for the day of the issue or the 2 day before

iptables rules and ifconfig output for both nodes

Weave-net had been bounced to restore services on those pod/nodes but for completeness the output of a few more commands :

on kube-node-13 (click to expand)
$ ip route sh
default via 192.168.62.1 dev eth0
10.32.0.0/12 dev weave proto kernel scope link src 10.45.192.0
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.62.0/24 dev eth0 proto kernel scope link src 192.168.62.29

$ ip -4 -o addr
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
2: eth0    inet 192.168.62.29/24 brd 192.168.62.255 scope global eth0\       valid_lft forever preferred_lft forever
3: docker0    inet 172.17.0.1/16 scope global docker0\       valid_lft forever preferred_lft forever
6: weave    inet 10.45.192.0/12 brd 10.47.255.255 scope global weave\       valid_lft forever preferred_lft forever
on kube-node-15(click to expand)
$ ip route sh
default via 192.168.62.1 dev eth0
10.32.0.0/12 dev weave proto kernel scope link src 10.47.192.0
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.62.0/24 dev eth0 proto kernel scope link src 192.168.62.31

$ ip -4 -o addr
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
2: eth0    inet 192.168.62.31/24 brd 192.168.62.255 scope global eth0\       valid_lft forever preferred_lft forever
3: docker0    inet 172.17.0.1/16 scope global docker0\       valid_lft forever preferred_lft forever
6: weave    inet 10.47.192.0/12 brd 10.47.255.255 scope global weave\       valid_lft forever preferred_lft forever

I Looked at the output of those command, 8 minutes after the beginning of the issue I see kubelet attempting to teardown some volume and container running on the nodes and matching activities in docker.services. Seem like "normal" noise.
$ journalctl -u docker.service --no-pager
$ journalctl -u kubelet --no-pager

Thing that we tried

  • Deleting /var/lib/weave/* on all nodes and changing the weave-net daemonset to trigger a rollout. Situation persist.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions