This repository was archived by the owner on Jun 20, 2024. It is now read-only.
This repository was archived by the owner on Jun 20, 2024. It is now read-only.
after weave pod restart all other peers became unreachable #3580
Open
Description
What you expected to happen?
after restart it should able to reach all the peers again.
my weave pod on one kube node restarted because of OOMkilled error. after restart by kube it was not able to reach to another node.
/home/weave # ./weave --local status ipam
4a:12:0d:31:99:7a(xxxx) 79872 IPs (03.8% of total) (10 active)
66:a2:33:df:31:13(xxx) 15360 IPs (00.7% of total) - unreachable!
de:66:f8:8e:6c:77(xxx) 40960 IPs (02.0% of total) - unreachable!
1e:1d:ce:04:8c:1f(xxx) 8192 IPs (00.4% of total) - unreachable!
4a:01:75:50:76:54(xxx) 6144 IPs (00.3% of total) - unreachable!
12:5c:3e:ce:52:84(xxx) 22528 IPs (01.1% of total) - unreachable!
...
...
kubectl logs weave-net-sl268 -n kube-system -c weave --tail 10 -f
INFO: 2019/01/12 15:56:46.713994 ->[yyyy:51130] connection accepted
INFO: 2019/01/12 15:56:46.714519 ->[yyyy:51130|22:4d:04:a7:b8:d1(xxxx)]: connection ready; using protocol version 2
INFO: 2019/01/12 15:56:46.714580 overlay_switch ->[22:4d:04:a7:b8:d1(xxxx)] using fastdp
INFO: 2019/01/12 15:56:46.714598 ->[yyyy:51130|22:4d:04:a7:b8:d1(xxxx)]: connection added (new peer)
INFO: 2019/01/12 15:56:46.720267 ->[yyyy:51130|22:4d:04:a7:b8:d1(xxxx)]: connection shutting down due to error: Received update for IP range I own at 100.107.0.0 v913: incoming message says owner 22:4d:04:a7:b8:d1 v921
INFO: 2019/01/12 15:56:46.720313 ->[yyyy:51130|22:4d:04:a7:b8:d1(xxx)]: connection deleted
INFO: 2019/01/12 15:56:46.720328 Removed unreachable peer 22:4d:04:a7:b8:d1(xxx)
INFO: 2019/01/12 15:56:47.379314 ->[zzzz:6783] attempting connection
INFO: 2019/01/12 15:56:47.380240 ->[zzzz:6783|de:66:f8:8e:6c:77(xxx)]: connection ready; using protocol version 2
INFO: 2019/01/12 15:56:47.380306 overlay_switch ->[de:66:f8:8e:6c:77(xxx)] using fastdp
INFO: 2019/01/12 15:56:47.380330 ->[zzzz:6783|de:66:f8:8e:6c:77(xxx)]: connection added (new peer)
INFO: 2019/01/12 15:56:47.381566 ->[zzzz:6783|de:66:f8:8e:6c:77(xxx)]: connection shutting down due to error: Received update for IP range I own at 100.107.0.0 v913: incoming message says owner 22:4d:04:a7:b8:d1 v921
INFO: 2019/01/12 15:56:47.381608 ->[zzzz:6783|de:66:f8:8e:6c:77(xxx)]: connection deleted
INFO: 2019/01/12 15:56:47.381626 Removed unreachable peer de:66:f8:8e:6c:77(xxx)
INFO: 2019/01/12 15:56:47.764266 ->[ipipip:6783] attempting connection
INFO: 2019/01/12 15:56:47.765100 ->[ipipip:6783|76:2a:a4:03:da:49(xxx)]: connection ready; using protocol version 2
INFO: 2019/01/12 15:56:47.765156 overlay_switch ->[76:2a:a4:03:da:49(xxx)] using fastdp
INFO: 2019/01/12 15:56:47.765178 ->[ipipip:6783|76:2a:a4:03:da:49(xxx)]: connection added (new peer)
INFO: 2019/01/12 15:56:47.766127 ->[ipipip:6783|76:2a:a4:03:da:49(xxx)]: connection shutting down due to error: Received update for IP range I own at 100.107.0.0 v913: incoming message says owner 22:4d:04:a7:b8:d1 v921
INFO: 2019/01/12 15:56:47.766175 ->[ipipip:6783|76:2a:a4:03:da:49(xxxx)]: connection deleted
INFO: 2019/01/12 15:56:47.766190 Removed unreachable peer 76:2a:a4:03:da:49(xxxx)
error from syslog at the time of restart
Jan 12 08:30:56 xxxx kernel: [90423.466954] Memory cgroup out of memory: Kill process 7910 (weaver) score 1902 or sacrifice child
Jan 12 08:30:56 xxxx kernel: [90423.469555] Killed process 7910 (weaver) total-vm:1663936kB, anon-rss:167768kB, file-rss:24112kB
Jan 12 08:30:56 xxxx systemd[1]: Starting Run docker-healthcheck once...
Jan 12 08:30:56 xxxx docker-healthcheck[22777]: docker healthy
Jan 12 08:30:56 xxxx systemd[1]: Started Run docker-healthcheck once.
Jan 12 08:30:57 xxxx kubelet[7360]: I0112 08:30:57.658336 7360 kubelet.go:1928] SyncLoop (PLEG): "weave-net-5dhpt_kube-system(dfe2cf4e-15a4-11e9-83b3-024f4d98cc1c)", event: &pleg.PodLifecycleEvent{ID:"dfe2cf4e-15a4-11e9-83b3-024f4d98cc1c", Type:"ContainerDied", Data:"2b8dde9b7731111ce2c3dfa30bb8ad18b27bf1a99b914001be353ab98b40e37a"}
Jan 12 08:30:57 xxxx dockerd[7326]: time="2019-01-12T08:30:57.662537994Z" level=warning msg="Your kernel does not support swap limit capabilities,or the cgroup is not mounted. Memory limited without swap."
Jan 12 08:30:57 xxxx dockerd[7326]: time="2019-01-12T08:30:57.792836499Z" level=warning msg="Unknown healthcheck type 'NONE' (expected 'CMD') in container 69e3a920ec7fc33c2fb3ed97edc56d272724d56e04453cd3dd1e627144142290"
Jan 12 08:30:57 xxxx kernel: [90424.699471] xt_physdev: using --physdev-out in the OUTPUT, FORWARD and POSTROUTING chains for non-bridged traffic is not supported anymore.
Jan 12 08:30:57 xxxx kernel: [90424.700270] xt_physdev: using --physdev-out in the OUTPUT, FORWARD and POSTROUTING chains for non-bridged traffic is not supported anymore.
Jan 12 08:30:57 xxxx kernel: [90424.700866] xt_physdev: using --physdev-out in the OUTPUT, FORWARD and POSTROUTING chains for non-bridged traffic is not supported anymore.
Jan 12 08:30:57 xxxx kernel: [90424.701431] xt_physdev: using --physdev-out in the OUTPUT, FORWARD and POSTROUTING chains for non-bridged traffic is not supported anymore.
Jan 12 08:30:58 xxxx kernel: [90424.930624] vxlan: Cannot bind port 41640, err=-98
Jan 12 08:30:58 xxxx kernel: [90424.930650] device vxlan-41640 entered promiscuous mode
Jan 12 08:30:58 xxxx kernel: [90424.930704] device vxlan-41640 left promiscuous mode
Jan 12 08:30:58 xxxx kernel: [90424.948759] device vethwedu entered promiscuous mode
Jan 12 08:30:58 xxxx kernel: [90424.948875] device vethwedu left promiscuous mode
Jan 12 08:30:58 xxxx kernel: [90424.948877] weave: port 12(vethwedu) entered disabled state
Jan 12 08:30:58 xxxx kernel: [90425.005213] xt_physdev: using --physdev-out in the OUTPUT, FORWARD and POSTROUTING chains for non-bridged traffic is not supported anymore.
Jan 12 08:30:58 xxxx kernel: [90425.038171] device vxlan-6784 left promiscuous mode
Jan 12 08:30:58 xxxx kernel: [90425.080762] device vxlan-6784 entered promiscuous mode
Jan 12 08:30:58 xxxx systemd-udevd[23061]: Could not generate persistent MAC address for vxlan-6784: No such file or directory
Jan 12 08:30:58 xxxx kubelet[7360]: I0112 08:30:58.672947 7360 kubelet.go:1928] SyncLoop (PLEG): "weave-net-5dhpt_kube-system(dfe2cf4e-15a4-11e9-83b3-024f4d98cc1c)", event: &pleg.PodLifecycleEvent{ID:"dfe2cf4e-15a4-11e9-83b3-024f4d98cc1c", Type:"ContainerStarted", Data:"69e3a920ec7fc33c2fb3ed97edc56d272724d56e04453cd3dd1e627144142290"}
Jan 12 08:30:59 xxxx ntpd[7228]: bind(39) AF_INET6 fe80::e08c:eeff:fed6:75c%173#123 flags 0x11 failed: Cannot assign requested address
Jan 12 08:30:59 xxxx ntpd[7228]: unable to create socket on vxlan-6784 (116) for fe80::e08c:eeff:fed6:75c%173#123
Jan 12 08:30:59 xxxx ntpd[7228]: failed to init interface for address fe80::e08c:eeff:fed6:75c%173
Jan 12 08:30:59 xxxx ntpd[7228]: Deleting interface #57 vxlan-6784, fe80::20bb:b6ff:fec8:1e3b%84#123, interface stats: received=0, sent=0, dropped=0, active_time=68499 secs
i have increase the weave memory as well so it wont get oomkilled and deleted the older pods but new pods on the same node still have all other peers as unreachable
Versions:
$ weave version
./weave --local version
weave 2.5.0
$ docker version
docker version
Client:
Version: 17.03.2-ce
API version: 1.27
Go version: go1.7.5
Git commit: f5ec1e2
Built: Tue Jun 27 03:35:14 2017
OS/Arch: linux/amd64
$ uname -a
uname -a
Linux ip-10-150-42-79 4.4.0-1054-aws #63-Ubuntu SMP Wed Mar 28 19:42:42 UTC 2018 x86_64 Linux
$ kubectl version
kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-04T07:51:55Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.1", GitCommit:"4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:"clean", BuildDate:"2018-10-05T16:36:14Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Activity