Skip to content

🌱 Avoid large number of connection error traces in kubeadm controlplane controller #12106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dmvolod
Copy link
Member

@dmvolod dmvolod commented Apr 16, 2025

What this PR does / why we need it:
This small fix removes large number of stack traces of the workload cluster connection error while it's not ready. The logs spams with predictable stack trace on each reconcile loop with full stack trace instead connection is not ready info.

2025-04-16T19:50:59+03:00	INFO	Reconcile KubeadmControlPlane	{"controller": "kubeadmcontrolplane", "controllerGroup": "controlplane.cluster.x-k8s.io", "controllerKind": "KubeadmControlPlane", "KubeadmControlPlane": {"name":"test-mgmt-control-plane","namespace":"default"}, "namespace": "default", "name": "test-mgmt-control-plane", "reconcileID": "b5b5f0e9-e0a5-46eb-925a-9536fb144e23", "Cluster": {"name":"test-mgmt","namespace":"default"}}
2025-04-16T19:50:59+03:00	INFO	Scaling up control plane	{"controller": "kubeadmcontrolplane", "controllerGroup": "controlplane.cluster.x-k8s.io", "controllerKind": "KubeadmControlPlane", "KubeadmControlPlane": {"name":"test-mgmt-control-plane","namespace":"default"}, "namespace": "default", "name": "test-mgmt-control-plane", "reconcileID": "b5b5f0e9-e0a5-46eb-925a-9536fb144e23", "Cluster": {"name":"test-mgmt","namespace":"default"}, "desired": 3, "existing": 1}
2025-04-16T19:50:59+03:00	INFO	Waiting for control plane to pass preflight checks	{"controller": "kubeadmcontrolplane", "controllerGroup": "controlplane.cluster.x-k8s.io", "controllerKind": "KubeadmControlPlane", "KubeadmControlPlane": {"name":"test-mgmt-control-plane","namespace":"default"}, "namespace": "default", "name": "test-mgmt-control-plane", "reconcileID": "b5b5f0e9-e0a5-46eb-925a-9536fb144e23", "Cluster": {"name":"test-mgmt","namespace":"default"}, "failures": "Machine test-mgmt-control-plane-pqh7s does not have a corresponding Node yet (Machine.status.nodeRef not set)"}
2025-04-16T19:50:59+03:00	DEBUG	events	Waiting for control plane to pass preflight checks to continue reconciliation: Machine test-mgmt-control-plane-pqh7s does not have a corresponding Node yet (Machine.status.nodeRef not set)	{"type": "Warning", "object": {"kind":"KubeadmControlPlane","namespace":"default","name":"test-mgmt-control-plane","uid":"06538869-e95e-40e4-8a90-f602672391e5","apiVersion":"controlplane.cluster.x-k8s.io/v1beta1","resourceVersion":"742"}, "reason": "ControlPlaneUnhealthy"}
2025-04-16T19:50:59+03:00	ERROR	Could not connect to workload cluster to fetch status	{"controller": "kubeadmcontrolplane", "controllerGroup": "controlplane.cluster.x-k8s.io", "controllerKind": "KubeadmControlPlane", "KubeadmControlPlane": {"name":"test-mgmt-control-plane","namespace":"default"}, "namespace": "default", "name": "test-mgmt-control-plane", "reconcileID": "b5b5f0e9-e0a5-46eb-925a-9536fb144e23", "Cluster": {"name":"test-mgmt","namespace":"default"}, "error": "failed to create remote cluster client: default/test-mgmt: failed to get REST config: failed to create cluster accessor: error creating http client and mapper for remote cluster \"default/test-mgmt\": error creating client for remote cluster \"default/test-mgmt\": cluster is not reachable: Get \"https://10.0.180.10:6443/?timeout=5s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")", "errorVerbose": "default/test-mgmt: failed to get REST config: failed to create cluster accessor: error creating http client and mapper for remote cluster \"default/test-mgmt\": error creating client for remote cluster \"default/test-mgmt\": cluster is not reachable: Get \"https://10.0.180.10:6443/?timeout=5s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kubernetes\")\nfailed to create remote cluster client\nsigs.k8s.io/cluster-api/controlplane/kubeadm/internal/controllers.(*KubeadmControlPlaneReconciler).updateStatus\n\t/home/dvolodin/go/pkg/mod/sigs.k8s.io/[email protected]/controlplane/kubeadm/internal/controllers/status.go:89\nsigs.k8s.io/cluster-api/controlplane/kubeadm/internal/controllers.(*KubeadmControlPlaneReconciler).Reconcile.func1\n\t/home/dvolodin/go/pkg/mod/sigs.k8s.io/[email protected]/controlplane/kubeadm/internal/controllers/controller.go:206\nsigs.k8s.io/cluster-api/controlplane/kubeadm/internal/controllers.(*KubeadmControlPlaneReconciler).Reconcile\n\t/home/dvolodin/go/pkg/mod/sigs.k8s.io/[email protected]/controlplane/kubeadm/internal/controllers/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/dvolodin/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/dvolodin/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/dvolodin/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/dvolodin/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222\nruntime.goexit\n\t/home/dvolodin/sdk/go1.23.5/src/runtime/asm_amd64.s:1700"}
sigs.k8s.io/cluster-api/controlplane/kubeadm/internal/controllers.(*KubeadmControlPlaneReconciler).Reconcile.func1
	/home/dvolodin/go/pkg/mod/sigs.k8s.io/[email protected]/controlplane/kubeadm/internal/controllers/controller.go:209
sigs.k8s.io/cluster-api/controlplane/kubeadm/internal/controllers.(*KubeadmControlPlaneReconciler).Reconcile
	/home/dvolodin/go/pkg/mod/sigs.k8s.io/[email protected]/controlplane/kubeadm/internal/controllers/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/home/dvolodin/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/home/dvolodin/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/home/dvolodin/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/home/dvolodin/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222

Another fix can be implemented with validation controlPlane.Cluster.Status.InfrastructureReady before connection to avoid connection problems and large number of noisy stack traces.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

/area control-plane

@k8s-ci-robot k8s-ci-robot added area/control-plane Issues or PRs related to control-plane lifecycle management cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 16, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign enxebre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane Issues or PRs related to control-plane lifecycle management cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants