clusterdiscovery: replace klog.Fatalf with klog.Errorf in joinClusterAPICluster#7637
clusterdiscovery: replace klog.Fatalf with klog.Errorf in joinClusterAPICluster#7637ManojLamani wants to merge 1 commit into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This PR improves the stability of the karmada-controller-manager by updating error handling logic within the cluster discovery process. By switching from a fatal exit to an error return, the system can now gracefully handle transient failures during REST configuration, allowing the controller to retry operations instead of terminating unexpectedly. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adjusts Cluster API join behavior to fail gracefully when the management cluster kubeconfig cannot be loaded, instead of terminating the process.
Changes:
- Replaces
klog.Fatalf(...)withklog.Errorf(...)and returns the encountered error. - Allows upstream callers to handle the failure path rather than hard-exiting.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| clusterRestConfig, err := apiclient.RestConfig("", kubeconfigPath) | ||
| if err != nil { | ||
| klog.Fatalf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err) | ||
| klog.Errorf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err) | ||
| return err | ||
| } |
| clusterRestConfig, err := apiclient.RestConfig("", kubeconfigPath) | ||
| if err != nil { | ||
| klog.Fatalf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err) | ||
| klog.Errorf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err) |
There was a problem hiding this comment.
Code Review
This pull request updates the cluster-api detector to return an error instead of terminating the process with a fatal log when failing to retrieve the cluster rest config. The reviewer suggested correcting the log message to refer to the 'managed cluster' rather than the 'management cluster' to prevent confusion during troubleshooting.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| clusterRestConfig, err := apiclient.RestConfig("", kubeconfigPath) | ||
| if err != nil { | ||
| klog.Fatalf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err) | ||
| klog.Errorf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err) |
There was a problem hiding this comment.
The log message refers to the 'management cluster rest config', but clusterRestConfig is actually the rest config of the managed (workload) cluster being joined, which is retrieved from the secret. To avoid confusion during troubleshooting, please update the log message to refer to the 'managed cluster'.
| klog.Errorf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err) | |
| klog.Errorf("Failed to get cluster-api managed cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err) |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7637 +/- ##
==========================================
- Coverage 42.16% 42.06% -0.10%
==========================================
Files 879 879
Lines 54677 54828 +151
==========================================
+ Hits 23052 23063 +11
- Misses 29880 30021 +141
+ Partials 1745 1744 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
| clusterRestConfig, err := apiclient.RestConfig("", kubeconfigPath) | ||
| if err != nil { | ||
| klog.Fatalf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err) | ||
| klog.Errorf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err) |
There was a problem hiding this comment.
| klog.Errorf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err) | |
| klog.Errorf("Failed to build rest config for cluster-api cluster(%s) from kubeconfig %s: %v", clusterWideKey.Name, kubeconfigPath, err) |
There was a problem hiding this comment.
Done, applied your suggestion. Thank you @zhzhuang-zju!
|
hi @ManojLamani, please squash the commits, thx |
…APICluster klog.Fatalf calls os.Exit(1) which terminates the entire controller-manager process. In a reconcile loop, a transient error fetching the REST config should be retried, not fatal. Replace with klog.Errorf and return the error so the item is re-queued with exponential backoff by the async worker. Signed-off-by: Manoj Lamani <manoj.p24@medhaviskillsuniversity.edu.in>
b46b933 to
7e958a2
Compare
|
Done, commits have been squashed. Thank you @zhzhuang-zju! |
/kind bug
What this PR does / why we need it:
joinClusterAPIClustercallsklog.Fatalfwhenapiclient.RestConfigfails to build the REST config for a cluster-api managed cluster.
klog.Fatalfcallsos.Exit(1), which terminates the entirekarmada-controller-managerprocess. This is too aggressive — the erroris potentially transient (e.g. temporary network issue) and the reconcile
loop should retry it via exponential backoff instead of crashing.
This PR replaces
klog.Fatalfwithklog.Errorfand addsreturn errso the async worker re-queues the item with backoff, consistent with the
error handling pattern used throughout the rest of this function.
Which issue(s) this PR fixes:
Fixes #7636
Special notes for your reviewer:
klog.Fatalfincontrollermanager.go(startup path) is correct and left unchanged.joinClusterAPIClusteris affected.Does this PR introduce a user-facing change?: