Skip to content

clusterdiscovery: replace klog.Fatalf with klog.Errorf in joinClusterAPICluster#7637

Open
ManojLamani wants to merge 1 commit into
karmada-io:masterfrom
ManojLamani:fix-clusterapi-fatal-exit
Open

clusterdiscovery: replace klog.Fatalf with klog.Errorf in joinClusterAPICluster#7637
ManojLamani wants to merge 1 commit into
karmada-io:masterfrom
ManojLamani:fix-clusterapi-fatal-exit

Conversation

@ManojLamani

Copy link
Copy Markdown

/kind bug

What this PR does / why we need it:

joinClusterAPICluster calls klog.Fatalf when apiclient.RestConfig
fails to build the REST config for a cluster-api managed cluster.
klog.Fatalf calls os.Exit(1), which terminates the entire
karmada-controller-manager process. This is too aggressive — the error
is potentially transient (e.g. temporary network issue) and the reconcile
loop should retry it via exponential backoff instead of crashing.

This PR replaces klog.Fatalf with klog.Errorf and adds return err
so the async worker re-queues the item with backoff, consistent with the
error handling pattern used throughout the rest of this function.

Which issue(s) this PR fixes:
Fixes #7636

Special notes for your reviewer:

  • The identical klog.Fatalf in controllermanager.go (startup path) is correct and left unchanged.
  • Only the reconcile path inside joinClusterAPICluster is affected.

Does this PR introduce a user-facing change?:

`karmada-controller-manager`: Fixed the issue that a transient REST config
error during cluster-api cluster join caused the entire controller-manager
process to exit instead of retrying.

Copilot AI review requested due to automatic review settings June 17, 2026 04:57
@karmada-bot karmada-bot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 17, 2026
@karmada-bot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign whitewindmills for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist

Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This PR improves the stability of the karmada-controller-manager by updating error handling logic within the cluster discovery process. By switching from a fatal exit to an error return, the system can now gracefully handle transient failures during REST configuration, allowing the controller to retry operations instead of terminating unexpectedly.

Highlights

  • Error handling improvement: Replaced klog.Fatalf with klog.Errorf in joinClusterAPICluster to prevent the controller-manager from exiting on transient REST config errors.
  • Reconcile logic: Added return err to ensure the async worker correctly re-queues the item for retry, improving system resilience.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@karmada-bot karmada-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jun 17, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adjusts Cluster API join behavior to fail gracefully when the management cluster kubeconfig cannot be loaded, instead of terminating the process.

Changes:

  • Replaces klog.Fatalf(...) with klog.Errorf(...) and returns the encountered error.
  • Allows upstream callers to handle the failure path rather than hard-exiting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 208 to 212
clusterRestConfig, err := apiclient.RestConfig("", kubeconfigPath)
if err != nil {
klog.Fatalf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err)
klog.Errorf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err)
return err
}
clusterRestConfig, err := apiclient.RestConfig("", kubeconfigPath)
if err != nil {
klog.Fatalf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err)
klog.Errorf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err)

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the cluster-api detector to return an error instead of terminating the process with a fatal log when failing to retrieve the cluster rest config. The reviewer suggested correcting the log message to refer to the 'managed cluster' rather than the 'management cluster' to prevent confusion during troubleshooting.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

clusterRestConfig, err := apiclient.RestConfig("", kubeconfigPath)
if err != nil {
klog.Fatalf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err)
klog.Errorf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The log message refers to the 'management cluster rest config', but clusterRestConfig is actually the rest config of the managed (workload) cluster being joined, which is retrieved from the secret. To avoid confusion during troubleshooting, please update the log message to refer to the 'managed cluster'.

Suggested change
klog.Errorf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err)
klog.Errorf("Failed to get cluster-api managed cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err)

@codecov-commenter

codecov-commenter commented Jun 17, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 42.06%. Comparing base (e88a128) to head (7e958a2).
⚠️ Report is 31 commits behind head on master.

Files with missing lines Patch % Lines
pkg/clusterdiscovery/clusterapi/clusterapi.go 0.00% 2 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7637      +/-   ##
==========================================
- Coverage   42.16%   42.06%   -0.10%     
==========================================
  Files         879      879              
  Lines       54677    54828     +151     
==========================================
+ Hits        23052    23063      +11     
- Misses      29880    30021     +141     
+ Partials     1745     1744       -1     
Flag Coverage Δ
unittests 42.06% <0.00%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

clusterRestConfig, err := apiclient.RestConfig("", kubeconfigPath)
if err != nil {
klog.Fatalf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err)
klog.Errorf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
klog.Errorf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err)
klog.Errorf("Failed to build rest config for cluster-api cluster(%s) from kubeconfig %s: %v", clusterWideKey.Name, kubeconfigPath, err)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, applied your suggestion. Thank you @zhzhuang-zju!

@zhzhuang-zju

Copy link
Copy Markdown
Contributor

hi @ManojLamani, please squash the commits, thx

…APICluster

klog.Fatalf calls os.Exit(1) which terminates the entire controller-manager
process. In a reconcile loop, a transient error fetching the REST config
should be retried, not fatal. Replace with klog.Errorf and return the error
so the item is re-queued with exponential backoff by the async worker.

Signed-off-by: Manoj Lamani <manoj.p24@medhaviskillsuniversity.edu.in>
@ManojLamani ManojLamani force-pushed the fix-clusterapi-fatal-exit branch from b46b933 to 7e958a2 Compare June 23, 2026 07:43
@ManojLamani

Copy link
Copy Markdown
Author

Done, commits have been squashed. Thank you @zhzhuang-zju!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

clusterdiscovery: klog.Fatalf in joinClusterAPICluster causes process crash on transient REST config error

5 participants