Fix metricsadapter: sort GetMetricBySelector results for deterministic output by ManojLamani · Pull Request #7633 · karmada-io/karmada

ManojLamani · 2026-06-15T18:34:04Z

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

GetMetricBySelector aggregates custom metrics from member clusters into a
map (sameMetrics) and then iterates the map to build the response Items slice.
Go map iteration order is non-deterministic, so every call returns metrics in a
different order. This can cause inconsistent behavior in HPA controllers that
consume the custom metrics API and compare metric lists across calls.

This PR sorts the Items slice by DescribedObject.Name before returning,
producing a stable and predictable response. Resolves the existing TODO left
by @chaunceyjiang.

Which issue(s) this PR fixes:

N/A

Special notes for your reviewer:

Addresses the TODO comment in GetMetricBySelector:
// TODO(chaunceyjiang) The MetricValue items need to be sorted.
The sort key is DescribedObject.Name, which uniquely identifies a metric
target within a namespace.
No existing test file for this package; the change is a single sort.Slice
call with no logic branching.

Does this PR introduce a user-facing change?:

`karmada-metrics-adapter`: Fixed the issue that `GetMetricBySelector` returned
custom metrics in non-deterministic order when aggregating results from multiple
member clusters.

gemini-code-assist · 2026-06-15T18:34:12Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses non-deterministic behavior in the custom metrics adapter by ensuring that metric lists are sorted before being returned. Additionally, it improves the robustness of the cluster discovery logic by replacing a fatal error with proper error propagation.

Highlights

Deterministic Metric Ordering: Implemented sorting for the GetMetricBySelector results using DescribedObject.Name to ensure consistent, predictable output across API calls.
Error Handling Improvement: Updated joinClusterAPICluster to use klog.Errorf and return an error instead of calling klog.Fatalf, preventing unnecessary process termination.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR improves determinism and resilience by sorting returned custom metric values and avoiding a process exit when cluster-api kubeconfig loading fails.

Changes:

Sort MetricValueList.Items by described object name to return deterministic results.
Replace klog.Fatalf with error logging + returning the error when management cluster REST config cannot be built.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
pkg/metricsadapter/provider/custommetrics.go	Adds sorting for stable ordering of returned metric values.
pkg/clusterdiscovery/clusterapi/clusterapi.go	Avoids hard process termination on REST config creation failure by returning an error.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+	sort.Slice(metricValueList.Items, func(i, j int) bool {
+		return metricValueList.Items[i].DescribedObject.Name < metricValueList.Items[j].DescribedObject.Name
+	})


+		klog.Errorf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err)
+		return err


gemini-code-assist

Code Review

This pull request replaces a fatal log with an error return in the cluster discovery logic and implements sorting for custom metrics by their described object's name. The reviewer suggested optimizing the metrics sorting implementation by checking for an empty slice early and pre-allocating the slice capacity to prevent unnecessary allocations.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-15T18:35:26Z

 	for _, metric := range sameMetrics {
 		metricValueList.Items = append(metricValueList.Items, metric)
 	}
+	sort.Slice(metricValueList.Items, func(i, j int) bool {
+		return metricValueList.Items[i].DescribedObject.Name < metricValueList.Items[j].DescribedObject.Name
+	})
 	if len(metricValueList.Items) == 0 {
 		return nil, provider.NewMetricNotFoundError(info.GroupResource, info.Metric)
 	}


We can optimize this by checking if sameMetrics is empty first to return early. This avoids unnecessary slice allocation and sorting. Additionally, we can pre-allocate metricValueList.Items with the exact capacity needed (len(sameMetrics)) to prevent multiple slice reallocations during the loop.

Suggested change

for _, metric := range sameMetrics {

metricValueList.Items = append(metricValueList.Items, metric)

}

sort.Slice(metricValueList.Items, func(i, j int) bool {

return metricValueList.Items[i].DescribedObject.Name < metricValueList.Items[j].DescribedObject.Name

})

if len(metricValueList.Items) == 0 {

return nil, provider.NewMetricNotFoundError(info.GroupResource, info.Metric)

}

if len(sameMetrics) == 0 {

return nil, provider.NewMetricNotFoundError(info.GroupResource, info.Metric)

}

metricValueList.Items = make([]custom_metrics.MetricValue, 0, len(sameMetrics))

for _, metric := range sameMetrics {

metricValueList.Items = append(metricValueList.Items, metric)

}

sort.Slice(metricValueList.Items, func(i, j int) bool {

return metricValueList.Items[i].DescribedObject.Name < metricValueList.Items[j].DescribedObject.Name

})

codecov-commenter · 2026-06-15T19:00:40Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 42.04%. Comparing base (e88a128) to head (f44c0ed).
⚠️ Report is 31 commits behind head on master.

Files with missing lines	Patch %	Lines
pkg/metricsadapter/provider/custommetrics.go	0.00%	3 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #7633      +/-   ##
==========================================
- Coverage   42.16%   42.04%   -0.12%     
==========================================
  Files         879      879              
  Lines       54677    54830     +153     
==========================================
- Hits        23052    23051       -1     
- Misses      29880    30032     +152     
- Partials     1745     1747       +2

Flag	Coverage Δ
unittests	`42.04% <0.00%> (-0.12%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

zhzhuang-zju · 2026-06-16T12:43:26Z

thanks
/assign

zhzhuang-zju

Thanks @ManojLamani. I’m unsure whether this PR qualifies as a bugfix. Could you provide specific scenarios where functionality fails without this change?

Also, ask @jabellard to take a look; he is an expert on metrics.

zhzhuang-zju · 2026-06-17T03:00:56Z

+		klog.Errorf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err)
+		return err


This change seems unrelated to the main scope of this PR. I suggest submitting it as a separate PR.

Noted, I will remove this change from the current PR and submit it as a separate PR. Thank you!

ManojLamani · 2026-06-17T04:49:36Z

Thank you for the clarification @zhzhuang-zju.

You are right, this does not cause a hard functional failure. The core issue is that Go map iteration is non-deterministic, so GetMetricBySelector returns metrics in a different order on every call. While most HPA controllers iterate all items rather than relying on specific positions, an unstable API response makes the system harder to test and debug reliably. This is why the original author @chaunceyjiang left the TODO explicitly asking for sorting. And I've updated the kind bug to /kind cleanup to better reflect the nature of this change.

Thank you for your guidance.

Sure, @jabellard would appreciate your expertise on this. Please take a look when you get a chance!

…tput Go map iteration order is non-deterministic. After aggregating metrics from member clusters into a map, iterating it to build the response Items slice produced a different ordering on every call. Sort the Items slice by DescribedObject.Name before returning so that callers always receive a stable, predictable list. Signed-off-by: Manoj Lamani <manoj.p24@medhaviskillsuniversity.edu.in>

karmada-bot · 2026-06-17T04:51:45Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from zhzhuang-zju. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

pkg/metricsadapter/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

zhzhuang-zju · 2026-06-17T09:16:35Z

And I've updated the kind bug to /kind cleanup to better reflect the nature of this change.

It doesn't seem to take effect. Let me help with this.
/kind-remove bug
/kind cleanup

zhzhuang-zju · 2026-06-17T09:18:01Z

/remove-kind bug

ManojLamani · 2026-06-17T10:49:34Z

Thank you @zhzhuang-zju for fixing the labels!

Copilot AI review requested due to automatic review settings June 15, 2026 18:34

karmada-bot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 15, 2026

karmada-bot requested review from Garrybest and chaunceyjiang June 15, 2026 18:34

karmada-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jun 15, 2026

Copilot AI reviewed Jun 15, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 15, 2026

View reviewed changes

karmada-bot assigned zhzhuang-zju Jun 16, 2026

zhzhuang-zju reviewed Jun 17, 2026

View reviewed changes

ManojLamani force-pushed the fix-custommetrics-nondeterministic-order branch from 254ae1f to f44c0ed Compare June 17, 2026 04:51

karmada-bot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Jun 17, 2026

karmada-bot removed the kind/bug Categorizes issue or PR as related to a bug. label Jun 17, 2026

		klog.Errorf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err)
		return err

Uh oh!

Conversation

ManojLamani commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Jun 15, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zhzhuang-zju commented Jun 16, 2026

Uh oh!

zhzhuang-zju left a comment

Choose a reason for hiding this comment

Uh oh!

zhzhuang-zju Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

ManojLamani Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

ManojLamani commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

karmada-bot commented Jun 17, 2026

Uh oh!

zhzhuang-zju commented Jun 17, 2026

Uh oh!

zhzhuang-zju commented Jun 17, 2026

Uh oh!

ManojLamani commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ManojLamani commented Jun 15, 2026 •

edited

Loading

codecov-commenter commented Jun 15, 2026 •

edited

Loading

ManojLamani commented Jun 17, 2026 •

edited

Loading