Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Weave Net Daemonset fails to restart pod due to existing dummy interface #3414

Open
@0x2c7

Description

What you expected to happen?

The Weave Net Daemonset, who controls the weave-net pod in each node, should be able to restart a pod in a node when it fails / stopped for any reason.

What happened?

The weave-net pod gets Error and CrashLoopBack status and unable to function again, until I terminate that node.

How to reproduce it?

SSH into a node, use docker command to kill the weave-net container. Of course, this is just for re-produce. In fact, when on our production cluster, we sometimes meet the situation when weave-net crashes on a node and don't know why.

The logs point out that weave-net fails to create dummy interface:

FATA: 2018/09/26 05:08:04.369497 creating dummy interface: file exists

I have a small investigation, and it looks like the bug comes from net/bridge.go, in function initPrep, at those lines:

	dummy := &netlink.Dummy{LinkAttrs: netlink.NewLinkAttrs()}
	dummy.LinkAttrs.Name = "vethwedu"
	if err = netlink.LinkAdd(dummy); err != nil {
		return errors.Wrap(err, "creating dummy interface")
	}

Before the weave-net starts, it creates a dummy interface object, and when my pod starts, the interface already exists, checked with ip link | grep vethwedu command:

96782: vethwedu: <BROADCAST,NOARP> mtu 1376 qdisc noop state DOWN mode DEFAULT group default qlen 1000

It looks like in the previous session of weave-net, it fails to delete this dummy interface, or it is killed before deleting it. When I delete the dummy manually with ip link delete vethwedu, the pod runs smoothly and back to normal.

Adding a small check and delete if the dummy exists before creating a new one would solve this problem. Is it a good solution for this? If that's okay, I'll open an PR.

	if existingDummy, err = netlink.LinkByName("vethwedu"); err == nil {
            if err := netlink.LinkDel(existingDummy); err != nil {
              return errors.Wrap(err, "deleting existing dummy interface")
	    }
	}
        //...

Anything else we need to know?

I run our Kubernetes cluster on AWS, using KOPS.

Versions:

$ weave version: 2.4.1

$ docker version
Client:
 Version:      17.03.2-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 02:09:56 2017
 OS/Arch:      linux/amd64

$ uname -a
Linux ip-172-50-52-229 4.4.121-k8s #1 SMP Sun Mar 11 19:39:47 UTC 2018 x86_64 GNU/Linux

$ kubectl version
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.8", GitCommit:"c138b85178156011dc934c2c9f4837476876fb07", GitTreeState:"clean", BuildDate:"2018-05-21T18:53:18Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Logs:

$ kubectl logs -n kube-system <weave-net-pod> weave

DEBU: 2018/09/26 05:08:04.266053 [kube-peers] Checking peer "7a:1f:1c:b2:b7:7e" against list &{[{ea:43:25:46:a3:95 ip-172-50-61-167.ap-southeast-2.compute.internal} {16:ac:8c:71:55:20 ip-172-50-80-215.ap-southeast-2.compute.internal} {3e:af:c2:26:08:00 ip-172-50-110-136.ap-southeast-2.compute.internal} {2e:04:f7:c2:42:71 ip-172-50-32-17.ap-southeast-2.compute.internal} {32:4d:7c:65:31:8d ip-172-50-50-186.ap-southeast-2.compute.internal} {ae:92:06:fd:b3:e7 ip-172-50-93-2.ap-southeast-2.compute.internal} {7e:69:df:8f:8f:17 ip-172-50-41-197.ap-southeast-2.compute.internal} {62:aa:0a:e8:65:96 ip-172-50-109-73.ap-southeast-2.compute.internal} {f6:30:3e:1d:4b:8b ip-172-50-113-192.ap-southeast-2.compute.internal} {7a:1f:1c:b2:b7:7e ip-172-50-52-229.ap-southeast-2.compute.internal} {12:b7:c5:3d:f4:82 ip-172-50-67-61.ap-southeast-2.compute.internal} {aa:0d:6b:9c:56:9b ip-172-50-44-14.ap-southeast-2.compute.internal} {82:e9:f1:ce:c5:29 ip-172-50-58-155.ap-southeast-2.compute.internal} {26:a8:11:0d:76:e2 ip-172-50-33-242.ap-southeast-2.compute.internal}]}
INFO: 2018/09/26 05:08:04.297726 Command line options: map[ipalloc-range:100.96.0.0/11 port:6783 docker-api: expect-npc:true host-root:/host nickname:ip-172-50-52-229.ap-southeast-2.compute.internal conn-limit:100 db-prefix:/weavedb/weave-net http-addr:127.0.0.1:6784 metrics-addr:0.0.0.0:6782 name:7a:1f:1c:b2:b7:7e datapath:datapath ipalloc-init:consensus=14 no-dns:true]
INFO: 2018/09/26 05:08:04.297772 weave  2.3.0
FATA: 2018/09/26 05:08:04.369497 creating dummy interface: file exists

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions