Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Large kubernetes cluster failed deploying when using weave as network plugin #3125

Closed
@andersla

Description

What you expected to happen?

Kube-DNS should start with large cluster

What happened?

If I instead of deploying a small cluster (10 nodes - always working) deploy a large cluster 100-150 nodes, then kube-dns doesn't get into ready state. It failes with message Error syncing pod, skipping: failed to "CreatePodSandbox" for "kube-dns-3913472980-c2506_kube-system(ab0a9011-9de0-11e7-93ea-fa163e11ce46)" with CreatePodSandboxError: "CreatePodSandbox for pod "kube-dns-3913472980-c2506_kube-system(ab0a9011-9de0-11e7-93ea-fa163e11ce46)" failed: rpc error: code = 4 desc = context deadline exceeded"
All other pods ready: weave, kube-proxy, api-server etc. becomes ready
Always succeeds with a small cluster of 10 nodes

When I switched to flannel network plugin everything worked OK

How to reproduce it?

Deploy a 140 node cluster on OpenStack with KubeAdm 1.6.4
Weave 1.9.8 installs OK and gets into ready state on all nodes.
Same problem with Weave 2.0.4 installs OK and gets into ready state on all nodes.

Anything else we need to know?

When I switched to flannel network plugin everything worked OK

Versions:

$ weave version 1.9.8 and 2.0.4
$ docker version
$ uname -a
$ kubectl version ( KubeAdm 1.6.4 )

Logs:

kubectl describe pods kube-dns-3913472980-c2506 -n kube-system
Name:		kube-dns-3913472980-c2506
Namespace:	kube-system
Node:		icl-node-162/10.0.0.50
Start Time:	Wed, 20 Sep 2017 08:56:17 +0000
Labels:		k8s-app=kube-dns
		pod-template-hash=3913472980
Annotations:	kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"kube-dns-3913472980","uid":"aaf5bba6-9de0-11e7-93ea-fa163e11...
		scheduler.alpha.kubernetes.io/critical-pod=
Status:		Pending
IP:
Controllers:	ReplicaSet/kube-dns-3913472980
Containers:
  kubedns:
    Container ID:
    Image:		gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.1
    Image ID:
    Ports:		10053/UDP, 10053/TCP, 10055/TCP
    Args:
      --domain=cluster.local.
      --dns-port=10053
      --config-dir=/kube-dns-config
      --v=2
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Limits:
      memory:	170Mi
    Requests:
      cpu:	100m
      memory:	70Mi
    Liveness:	http-get http://:10054/healthcheck/kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:	http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
    Environment:
      PROMETHEUS_PORT:	10055
    Mounts:
      /kube-dns-config from kube-dns-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-h1wmd (ro)
  dnsmasq:
    Container ID:
    Image:		gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.1
    Image ID:
    Ports:		53/UDP, 53/TCP
    Args:
      -v=2
      -logtostderr
      -configDir=/etc/k8s/dns/dnsmasq-nanny
      -restartDnsmasq=true
      --
      -k
      --cache-size=1000
      --log-facility=-
      --server=/cluster.local/127.0.0.1#10053
      --server=/in-addr.arpa/127.0.0.1#10053
      --server=/ip6.arpa/127.0.0.1#10053
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Requests:
      cpu:		150m
      memory:		20Mi
    Liveness:		http-get http://:10054/healthcheck/dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:	<none>
    Mounts:
      /etc/k8s/dns/dnsmasq-nanny from kube-dns-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-h1wmd (ro)
  sidecar:
    Container ID:
    Image:		gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.1
    Image ID:
    Port:		10054/TCP
    Args:
      --v=2
      --logtostderr
      --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,A
      --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,A
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Requests:
      cpu:		10m
      memory:		20Mi
    Liveness:		http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:	<none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-h1wmd (ro)
Conditions:
  Type		Status
  Initialized 	True
  Ready 	False
  PodScheduled 	True
Volumes:
  kube-dns-config:
    Type:	ConfigMap (a volume populated by a ConfigMap)
    Name:	kube-dns
    Optional:	true
  kube-dns-token-h1wmd:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	kube-dns-token-h1wmd
    Optional:	false
QoS Class:	Burstable
Node-Selectors:	<none>
Tolerations:	CriticalAddonsOnly=:Exists
		node-role.kubernetes.io/master=:NoSchedule
		node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
		node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----			-------------	--------	------			-------
  19m		13m		26	default-scheduler			Warning		FailedScheduling	no nodes available to schedule pods
  13m		13m		1	default-scheduler			Normal		Scheduled		Successfully assigned kube-dns-3913472980-c2506 to icl-node-162
  9m		9m		1	kubelet, icl-node-162			Warning		FailedSync		Error syncing pod, skipping: failed to "CreatePodSandbox" for "kube-dns-3913472980-c2506_kube-system(ab0a9011-9de0-11e7-93ea-fa163e11ce46)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-3913472980-c2506_kube-system(ab0a9011-9de0-11e7-93ea-fa163e11ce46)\" failed: rpc error: code = 4 desc = context deadline exceeded"

  7m	27s	29	kubelet, icl-node-162		Warning	FailedSync	Error syncing pod, skipping: rpc error: code = 4 desc = context deadline exceeded

Network:

$ ip route
$ ip -4 -o addr
$ sudo iptables-save

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions