Skip to content

[Usernetes] kube-apiserver fails to connect to etcd: dial tcp 127.0.0.1:2379: connect: connection refused #65

@AkihiroSuda

Description

@AkihiroSuda

I'm trying to run Usernetes (single-node w/o VXLAN, as a baby step) with bypass4netnsd, but kubeadm fails:

$ export CONTAINER_ENGINE=nerdctl
$ make up
$ make kubeadm-init
[...]
kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Looks like kube-apiserver is failing to connect to the local etcd due to dial tcp 127.0.0.1:2379: connect: connection refused,
although the etcd process is running with --listen-client-urls=https://127.0.0.1:2379,https://10.100.201.100:2379.

Version

  • nerdctl: v2.0.0-beta.3
  • bypass4netns: the current master 2794f7e
  • Usernetes: gen2-v20240404.1 + the following annotations
diff --git a/docker-compose.yaml b/docker-compose.yaml
index 2ae7291..a036d80 100644
--- a/docker-compose.yaml
+++ b/docker-compose.yaml
@@ -39,6 +39,11 @@ services:
       # In addition, `net.ipv4.conf.default.rp_filter`
       # has to be set to 0 (disabled) or 2 (loose)
       # in the daemon's network namespace.
+    annotations:
+      # bypass4netns annotations are recognized since nerdctl v2.0
+      # TODO: enable bypass4netns only when bypass4netnsd is running.
+      "nerdctl/bypass4netns": "true"
+      "nerdctl/bypass4netns-ignore-subnets": "[\"10.244.0.0/16\", \"${U7S_NODE_SUBNET}\"]"
 networks:
   default:
     ipam:

Logs

$ nerdctl exec usernetes-node-1 sh -euxc 'tail /var/log/containers/kube-apiserver*'
+ tail /var/log/containers/kube-apiserver-u7s-suda-ws01_kube-system_kube-apiserver-e3bb8e1d239fbd21df5a10150d7cf97cddf2ac60f255e7c95e0444372270a590.log
2024-04-04T10:33:23.014467147Z stderr F W0404 10:33:23.014061       1 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused"
2024-04-04T10:33:23.525905133Z stderr F W0404 10:33:23.525525       1 logging.go:59] [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused"
2024-04-04T10:33:23.650741563Z stderr F W0404 10:33:23.650270       1 logging.go:59] [core] [Channel #5 SubChannel #6] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused"
2024-04-04T10:33:26.746781506Z stderr F W0404 10:33:26.746363       1 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused"
2024-04-04T10:33:27.177688658Z stderr F W0404 10:33:27.177096       1 logging.go:59] [core] [Channel #5 SubChannel #6] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused"
2024-04-04T10:33:27.255248965Z stderr F W0404 10:33:27.254843       1 logging.go:59] [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused"
2024-04-04T10:33:32.660451075Z stderr F W0404 10:33:32.660277       1 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused"
2024-04-04T10:33:32.928362209Z stderr F W0404 10:33:32.927956       1 logging.go:59] [core] [Channel #5 SubChannel #6] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused"
2024-04-04T10:33:34.194906969Z stderr F W0404 10:33:34.194549       1 logging.go:59] [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused"
2024-04-04T10:33:38.191445386Z stderr F F0404 10:33:38.190882       1 instance.go:290] Error creating leases: error creating storage factory: context deadline exceeded
$ nerdctl exec usernetes-node-1 sh -euxc 'tail /var/log/containers/etcd*'
+ tail /var/log/containers/etcd-u7s-suda-ws01_kube-system_etcd-5e02fa990f84a688cc176a9c61737122ca6bcaa5545e3629ad824977f582365f.log
2024-04-04T10:32:16.394968121Z stderr F {"level":"info","ts":"2024-04-04T10:32:16.394467Z","caller":"embed/serve.go:103","msg":"ready to serve client requests"}
2024-04-04T10:32:16.395555112Z stderr F {"level":"info","ts":"2024-04-04T10:32:16.395204Z","caller":"embed/serve.go:103","msg":"ready to serve client requests"}
2024-04-04T10:32:16.395949052Z stderr F {"level":"info","ts":"2024-04-04T10:32:16.395687Z","caller":"etcdmain/main.go:44","msg":"notifying init daemon"}
2024-04-04T10:32:16.395974454Z stderr F {"level":"info","ts":"2024-04-04T10:32:16.39574Z","caller":"etcdmain/main.go:50","msg":"successfully notified init daemon"}
2024-04-04T10:32:16.397006609Z stderr F {"level":"info","ts":"2024-04-04T10:32:16.394859Z","caller":"etcdserver/server.go:2571","msg":"setting up initial cluster version using v2 API","cluster-version":"3.5"}
2024-04-04T10:32:16.398486722Z stderr F {"level":"info","ts":"2024-04-04T10:32:16.398168Z","caller":"membership/cluster.go:584","msg":"set initial cluster version","cluster-id":"3f59b4f74cf82b90","local-member-id":"f3b2aef06a662d72","cluster-version":"3.5"}
2024-04-04T10:32:16.399161396Z stderr F {"level":"info","ts":"2024-04-04T10:32:16.398924Z","caller":"api/capability.go:75","msg":"enabled capabilities for version","cluster-version":"3.5"}
2024-04-04T10:32:16.399912362Z stderr F {"level":"info","ts":"2024-04-04T10:32:16.399745Z","caller":"etcdserver/server.go:2595","msg":"cluster version is updated","cluster-version":"3.5"}
2024-04-04T10:32:16.403525952Z stderr F {"level":"info","ts":"2024-04-04T10:32:16.403166Z","caller":"embed/serve.go:250","msg":"serving client traffic securely","traffic":"grpc+http","address":"127.0.0.1:2379"}
2024-04-04T10:32:16.405592265Z stderr F {"level":"info","ts":"2024-04-04T10:32:16.405178Z","caller":"embed/serve.go:250","msg":"serving client traffic securely","traffic":"grpc+http","address":"10.100.201.100:2379"}
$ nerdctl exec usernetes-node-1 ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 10:30 ?        00:00:01 /sbin/init
root         108       1  0 10:30 ?        00:00:00 /lib/systemd/systemd-journald
root         121       1  0 10:30 ?        00:00:00 /lib/systemd/systemd-udevd
root         129       1 10 10:30 ?        00:00:26 /usr/local/bin/containerd
root         149       0  1 10:30 pts/1    00:00:02 kubeadm init --config /tmp/kubeadm-config.yaml --skip-token-print
root         388       1  3 10:32 ?        00:00:04 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/pause:3.9 --runtime-cgroups=/system.slice/containerd.service --cloud-provider=external --node-labels=usernetes/host-ip=192.168.60.11
root         443       1  0 10:32 ?        00:00:00 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -id e0baeca2362cf60e66347e1bca8e1663b0bc33f173f049cc4c288cd31f0a021f -address /run/containerd/containerd.sock
root         453       1  0 10:32 ?        00:00:00 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -id 4862c3b92124466a817f12e7ce4882e3602f01e73ff70c68157d758d26777122 -address /run/containerd/containerd.sock
root         487       1  0 10:32 ?        00:00:00 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -id aaa4c48903c861beda8db0d2acc6ccddf7006032512d948df01a9d9009b97657 -address /run/containerd/containerd.sock
root         508       1  0 10:32 ?        00:00:00 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -id e8179f9642fbd7b705a4c221d2efd167cebeabc47863b90f2966a16bee0f3ca0 -address /run/containerd/containerd.sock
65535        536     443  0 10:32 ?        00:00:00 /pause
65535        547     453  0 10:32 ?        00:00:00 /pause
65535        560     487  0 10:32 ?        00:00:00 /pause
65535        576     508  0 10:32 ?        00:00:00 /pause
root         680     443  0 10:32 ?        00:00:01 kube-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf --bind-address=127.0.0.1 --client-ca-file=/etc/kubernetes/pki/ca.crt --cloud-provider=external --cluster-cidr=10.244.0.0/16 --cluster-name=kubernetes --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt --cluster-signing-key-file=/etc/kubernetes/pki/ca.key --controllers=*,bootstrapsigner,tokencleaner --kubeconfig=/etc/kubernetes/controller-manager.conf --leader-elect=true --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --root-ca-file=/etc/kubernetes/pki/ca.crt --service-account-private-key-file=/etc/kubernetes/pki/sa.key --service-cluster-ip-range=10.96.0.0/16 --use-service-account-credentials=true
root         691     487  1 10:32 ?        00:00:02 kube-scheduler --authentication-kubeconfig=/etc/kubernetes/scheduler.conf --authorization-kubeconfig=/etc/kubernetes/scheduler.conf --bind-address=127.0.0.1 --kubeconfig=/etc/kubernetes/scheduler.conf --leader-elect=true
root         815     508  1 10:32 ?        00:00:01 etcd --advertise-client-urls=https://10.100.201.100:2379 --cert-file=/etc/kubernetes/pki/etcd/server.crt --client-cert-auth=true --data-dir=/var/lib/etcd --experimental-initial-corrupt-check=true --experimental-watch-progress-notify-interval=5s --initial-advertise-peer-urls=https://10.100.201.100:2380 --initial-cluster=u7s-suda-ws01=https://10.100.201.100:2380 --key-file=/etc/kubernetes/pki/etcd/server.key --listen-client-urls=https://127.0.0.1:2379,https://10.100.201.100:2379 --listen-metrics-urls=http://127.0.0.1:2381 --listen-peer-urls=https://10.100.201.100:2380 --name=u7s-suda-ws01 --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt --peer-client-cert-auth=true --peer-key-file=/etc/kubernetes/pki/etcd/peer.key --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt --snapshot-count=10000 --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
root        1111       0  0 10:34 ?        00:00:00 ps -ef

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions