Large kubernetes cluster failed deploying when using weave as network plugin #3125
Description
What you expected to happen?
Kube-DNS should start with large cluster
What happened?
If I instead of deploying a small cluster (10 nodes - always working) deploy a large cluster 100-150 nodes, then kube-dns doesn't get into ready state. It failes with message Error syncing pod, skipping: failed to "CreatePodSandbox" for "kube-dns-3913472980-c2506_kube-system(ab0a9011-9de0-11e7-93ea-fa163e11ce46)" with CreatePodSandboxError: "CreatePodSandbox for pod "kube-dns-3913472980-c2506_kube-system(ab0a9011-9de0-11e7-93ea-fa163e11ce46)" failed: rpc error: code = 4 desc = context deadline exceeded"
All other pods ready: weave, kube-proxy, api-server etc. becomes ready
Always succeeds with a small cluster of 10 nodes
When I switched to flannel network plugin everything worked OK
How to reproduce it?
Deploy a 140 node cluster on OpenStack with KubeAdm 1.6.4
Weave 1.9.8 installs OK and gets into ready state on all nodes.
Same problem with Weave 2.0.4 installs OK and gets into ready state on all nodes.
Anything else we need to know?
When I switched to flannel network plugin everything worked OK
Versions:
$ weave version 1.9.8 and 2.0.4
$ docker version
$ uname -a
$ kubectl version ( KubeAdm 1.6.4 )
Logs:
kubectl describe pods kube-dns-3913472980-c2506 -n kube-system
Name: kube-dns-3913472980-c2506
Namespace: kube-system
Node: icl-node-162/10.0.0.50
Start Time: Wed, 20 Sep 2017 08:56:17 +0000
Labels: k8s-app=kube-dns
pod-template-hash=3913472980
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"kube-dns-3913472980","uid":"aaf5bba6-9de0-11e7-93ea-fa163e11...
scheduler.alpha.kubernetes.io/critical-pod=
Status: Pending
IP:
Controllers: ReplicaSet/kube-dns-3913472980
Containers:
kubedns:
Container ID:
Image: gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.1
Image ID:
Ports: 10053/UDP, 10053/TCP, 10055/TCP
Args:
--domain=cluster.local.
--dns-port=10053
--config-dir=/kube-dns-config
--v=2
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:10054/healthcheck/kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
Environment:
PROMETHEUS_PORT: 10055
Mounts:
/kube-dns-config from kube-dns-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-h1wmd (ro)
dnsmasq:
Container ID:
Image: gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.1
Image ID:
Ports: 53/UDP, 53/TCP
Args:
-v=2
-logtostderr
-configDir=/etc/k8s/dns/dnsmasq-nanny
-restartDnsmasq=true
--
-k
--cache-size=1000
--log-facility=-
--server=/cluster.local/127.0.0.1#10053
--server=/in-addr.arpa/127.0.0.1#10053
--server=/ip6.arpa/127.0.0.1#10053
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 150m
memory: 20Mi
Liveness: http-get http://:10054/healthcheck/dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/etc/k8s/dns/dnsmasq-nanny from kube-dns-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-h1wmd (ro)
sidecar:
Container ID:
Image: gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.1
Image ID:
Port: 10054/TCP
Args:
--v=2
--logtostderr
--probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,A
--probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,A
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 10m
memory: 20Mi
Liveness: http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-h1wmd (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
kube-dns-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-dns
Optional: true
kube-dns-token-h1wmd:
Type: Secret (a volume populated by a Secret)
SecretName: kube-dns-token-h1wmd
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly=:Exists
node-role.kubernetes.io/master=:NoSchedule
node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
19m 13m 26 default-scheduler Warning FailedScheduling no nodes available to schedule pods
13m 13m 1 default-scheduler Normal Scheduled Successfully assigned kube-dns-3913472980-c2506 to icl-node-162
9m 9m 1 kubelet, icl-node-162 Warning FailedSync Error syncing pod, skipping: failed to "CreatePodSandbox" for "kube-dns-3913472980-c2506_kube-system(ab0a9011-9de0-11e7-93ea-fa163e11ce46)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-3913472980-c2506_kube-system(ab0a9011-9de0-11e7-93ea-fa163e11ce46)\" failed: rpc error: code = 4 desc = context deadline exceeded"
7m 27s 29 kubelet, icl-node-162 Warning FailedSync Error syncing pod, skipping: rpc error: code = 4 desc = context deadline exceeded
Network:
$ ip route
$ ip -4 -o addr
$ sudo iptables-save
Activity