-
Notifications
You must be signed in to change notification settings - Fork 80
Open
Labels
bugSomething isn't workingSomething isn't working
Description
MetalLB Version
operator v0.13.11
metallb v0.13.10
OS : Talos 1.3.7
Kubernetes : 1.24.9
CNI : Cilium 1.12.4
After upgrading from operator v0.13.4/metallb v0.13.5 to operator v0.13.10/metallb v0.13.11, the resource daemonset.apps/speaker went down and restarted after few minutes.
[eric@macross ~]$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/controller-db6f6ff7d-zjfcr 1/1 Running 0 70s
pod/metallb-operator-controller-manager-6fd4d656f-tx2hj 1/1 Running 0 15m
pod/metallb-operator-webhook-server-588bbdf874-g2jsd 1/1 Running 0 2m53s
pod/speaker-2tvk6 0/1 CrashLoopBackOff 33 (3m3s ago) 3h36m
pod/speaker-5v2sp 0/1 CrashLoopBackOff 33 (2m18s ago) 3h36m
pod/speaker-p7spx 0/1 CrashLoopBackOff 33 (3m59s ago) 20h
pod/speaker-wrs8n 0/1 CrashLoopBackOff 33 (3m59s ago) 3h37m
pod/speaker-xfj7v 0/1 CrashLoopBackOff 33 (3m32s ago) 3h36m
Looking at the logs of one of the pod, errors on get and watch configmaps appears and the speacker pod went down.
W0825 11:41:31.682290 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
E0825 11:41:31.682339 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
W0825 11:41:33.520445 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
E0825 11:41:33.520473 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
W0825 11:41:39.101431 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
E0825 11:41:39.101463 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
W0825 11:41:46.581417 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
E0825 11:41:46.581469 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
W0825 11:42:03.218915 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
E0825 11:42:03.219009 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
[...]
W0825 11:42:37.744778 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
E0825 11:42:37.744806 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.26.4/tools/cache/reflector.go:169: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:metallb-system:speaker" cannot list resource "configmaps" in API group "" in the namespace "metallb-system"
{"level":"error","ts":"2023-08-25T11:43:30Z","msg":"Could not wait for Cache to sync","controller":"node","controllerGroup":"","controllerKind":"Node","error":"failed to wait for node caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:211\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:216\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:242\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/manager/runnable_group.go:219"}
{"level":"info","ts":"2023-08-25T11:43:30Z","msg":"Stopping and waiting for non leader election runnables"}
{"level":"error","ts":"2023-08-25T11:43:30Z","msg":"Could not wait for Cache to sync","controller":"service","controllerGroup":"","controllerKind":"Service","error":"failed to wait for service caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:211\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:216\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:242\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/manager/runnable_group.go:219"}
{"level":"error","ts":"2023-08-25T11:43:30Z","msg":"Could not wait for Cache to sync","controller":"bgppeer","controllerGroup":"metallb.io","controllerKind":"BGPPeer","error":"failed to wait for bgppeer caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:211\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:216\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:242\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/manager/runnable_group.go:219"}
{"level":"info","ts":"2023-08-25T11:43:30Z","msg":"Stopping and waiting for leader election runnables"}
{"level":"error","ts":"2023-08-25T11:43:30Z","msg":"error received after stop sequence was engaged","error":"failed to wait for service caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/manager/internal.go:555"}
{"level":"error","ts":"2023-08-25T11:43:30Z","msg":"error received after stop sequence was engaged","error":"failed to wait for bgppeer caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/manager/internal.go:555"}
{"level":"info","ts":"2023-08-25T11:43:30Z","msg":"Stopping and waiting for caches"}
{"level":"error","ts":"2023-08-25T11:43:30Z","logger":"controller-runtime.source","msg":"failed to get informer from cache","error":"Timeout: failed waiting for *v1.ConfigMap Informer to sync","stacktrace":"sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/source/source.go:148\nk8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.26.0/pkg/util/wait/wait.go:235\nk8s.io/apimachinery/pkg/util/wait.poll\n\t/go/pkg/mod/k8s.io/apimachinery@v0.26.0/pkg/util/wait/wait.go:582\nk8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.26.0/pkg/util/wait/wait.go:547\nsigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/source/source.go:136"}
{"level":"info","ts":"2023-08-25T11:43:30Z","msg":"Stopping and waiting for webhooks"}
{"level":"info","ts":"2023-08-25T11:43:30Z","msg":"Wait completed, proceeding to shutdown the manager"}
{"caller":"main.go:201","error":"failed to wait for node caches to sync: timed out waiting for cache to be synced","level":"error","msg":"failed to run k8s client","op":"startup","ts":"2023-08-25T11:43:30Z"}
Initial installation and upgrade were both done using the manifest.
As a workaround, we added in the clusterrole metallb-system:speaker the autorization to get/list/watch the resource configmaps.
[eric@macross ~]$ kubectl get clusterrole metallb-system:speaker -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRole","metadata":{"annotations":{},"labels":{"app":"metallb"},"name":"metallb-system:speaker"},"rules":[{"apiGroups":[""],"resources":["services","endpoints","nodes","namespaces"],"verbs":["get","list","watch"]},{"apiGroups":["discovery.k8s.io"],"resources":["endpointslices"],"verbs":["get","list","watch"]},{"apiGroups":[""],"resources":["events"],"verbs":["create","patch"]},{"apiGroups":["policy"],"resourceNames":["speaker"],"resources":["podsecuritypolicies"],"verbs":["use"]}]}
creationTimestamp: "2022-09-13T07:16:45Z"
labels:
app: metallb
name: metallb-system:speaker
resourceVersion: "132426474"
uid: 12d48a2c-8274-49f7-8e51-aed128a7b112
rules:
- apiGroups:
- ""
resources:
- services
- endpoints
- nodes
- namespaces
- configmaps
verbs:
- get
- list
- watch
- apiGroups:
- discovery.k8s.io
resources:
- endpointslices
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
- apiGroups:
- policy
resourceNames:
- speaker
resources:
- podsecuritypolicies
verbs:
- use
After this modification and a full restart, everything is now working perfectly.
[eric@macross ~]$ kubectl get po -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
controller-db6f6ff7d-zjfcr 1/1 Running 0 24m 10.19.3.207 kw905-vso-pr <none> <none>
metallb-operator-controller-manager-6fd4d656f-tx2hj 1/1 Running 0 39m 10.19.3.131 kw905-vso-pr <none> <none>
metallb-operator-webhook-server-588bbdf874-g2jsd 1/1 Running 0 26m 10.19.3.208 kw905-vso-pr <none> <none>
speaker-5vqsf 1/1 Running 0 15m 10.4.205.104 kw902-vso-pr <none> <none>
speaker-8jjhv 1/1 Running 0 14m 10.4.205.103 kw901-vso-pr <none> <none>
speaker-jlz9b 1/1 Running 0 15m 10.4.205.107 kw905-vso-pr <none> <none>
speaker-jtcxx 1/1 Running 0 15m 10.4.205.106 kw904-vso-pr <none> <none>
speaker-nlwxq 1/1 Running 0 15m 10.4.205.105 kw903-vso-pr <none> <none>
[eric@macross ~]$ kubectl logs speaker-jtcxx
[...]
{"level":"info","ts":"2023-08-25T11:47:09Z","msg":"Starting workers","controller":"service","controllerGroup":"","controllerKind":"Service","worker count":1}
{"caller":"service_controller_reload.go:61","controller":"ServiceReconciler - reprocessAll","level":"info","start reconcile":"metallbreload/reload","ts":"2023-08-25T11:47:09Z"}
{"level":"info","ts":"2023-08-25T11:47:09Z","msg":"Starting workers","controller":"node","controllerGroup":"","controllerKind":"Node","worker count":1}
{"level":"info","ts":"2023-08-25T11:47:09Z","msg":"Starting workers","controller":"bgppeer","controllerGroup":"metallb.io","controllerKind":"BGPPeer","worker count":1}
{"caller":"node_controller.go:46","controller":"NodeReconciler","level":"info","start reconcile":"/km901-vso-pr","ts":"2023-08-25T11:47:09Z"}
{"caller":"config_controller.go:59","controller":"ConfigReconciler","level":"info","start reconcile":"/kw905-vso-pr","ts":"2023-08-25T11:47:09Z"}
{"caller":"node_controller.go:69","controller":"NodeReconciler","end reconcile":"/km901-vso-pr","level":"info","ts":"2023-08-25T11:47:09Z"}
[...]
{"caller":"config_controller.go:59","controller":"ConfigReconciler","level":"info","start reconcile":"/km902-vso-pr","ts":"2023-08-25T11:47:09Z"}
{"caller":"speakerlist.go:310","level":"info","msg":"node event - forcing sync","node addr":"10.4.205.105","node event":"NodeJoin","node name":"kw903-vso-pr","ts":"2023-08-25T11:47:09Z"}
{"caller":"main.go:374","event":"serviceAnnounced","ips":["10.4.207.211"],"level":"info","msg":"service has IP, announcing","pool":"vip-pool","protocol":"layer2","ts":"2023-08-25T11:47:09Z"}
{"caller":"service_controller_reload.go:104","controller":"ServiceReconciler - reprocessAll","end reconcile":"metallbreload/reload","level":"info","ts":"2023-08-25T11:47:09Z"}
[...]
{"caller":"speakerlist.go:310","level":"info","msg":"node event - forcing sync","node addr":"10.4.205.103","node event":"NodeJoin","node name":"kw901-vso-pr","ts":"2023-08-25T11:47:40Z"}
{"caller":"service_controller_reload.go:61","controller":"ServiceReconciler - reprocessAll","level":"info","start reconcile":"metallbreload/reload","ts":"2023-08-25T11:47:40Z"}
{"caller":"main.go:418","event":"serviceWithdrawn","ip":["10.4.207.209"],"ips":["10.4.207.209"],"level":"info","msg":"withdrawing service announcement","pool":"vip-pool","protocol":"layer2","reason":"notOwner","ts":"2023-08-25T11:47:40Z"}
{"caller":"main.go:374","event":"serviceAnnounced","ips":["10.4.207.211"],"level":"info","msg":"service has IP, announcing","pool":"vip-pool","protocol":"layer2","ts":"2023-08-25T11:47:40Z"}
{"caller":"service_controller_reload.go:104","controller":"ServiceReconciler - reprocessAll","end reconcile":"metallbreload/reload","level":"info","ts":"2023-08-25T11:47:40Z"}
[eric@macross ~]$ curl -Is http://argocd.tooling-nms-preprod.valentine.sfr.com/ | head -n 1
HTTP/1.1 200 OK
The diff between the original manifest and the one we used for upgrade.
[eric@macross metallb]$ diff metallb-operator.yaml metallb-operator-0.13.10.yaml
3587c3587
< value: quay.io/metallb/speaker:v0.13.9
---
> value: quay.io/metallb/speaker:v0.13.10
3589c3589
< value: quay.io/metallb/controller:v0.13.9
---
> value: quay.io/metallb/controller:v0.13.10
3664c3664
< image: quay.io/metallb/controller:v0.13.9
---
> image: quay.io/metallb/controller:v0.13.10
4212a4213
> - configmaps
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working