Skip to content

Fix metricsadapter: sort GetMetricBySelector results for deterministic output#7633

Open
ManojLamani wants to merge 1 commit into
karmada-io:masterfrom
ManojLamani:fix-custommetrics-nondeterministic-order
Open

Fix metricsadapter: sort GetMetricBySelector results for deterministic output#7633
ManojLamani wants to merge 1 commit into
karmada-io:masterfrom
ManojLamani:fix-custommetrics-nondeterministic-order

Conversation

@ManojLamani

@ManojLamani ManojLamani commented Jun 15, 2026

Copy link
Copy Markdown

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

GetMetricBySelector aggregates custom metrics from member clusters into a
map (sameMetrics) and then iterates the map to build the response Items slice.
Go map iteration order is non-deterministic, so every call returns metrics in a
different order. This can cause inconsistent behavior in HPA controllers that
consume the custom metrics API and compare metric lists across calls.

This PR sorts the Items slice by DescribedObject.Name before returning,
producing a stable and predictable response. Resolves the existing TODO left
by @chaunceyjiang.

Which issue(s) this PR fixes:

N/A

Special notes for your reviewer:

  • Addresses the TODO comment in GetMetricBySelector:
    // TODO(chaunceyjiang) The MetricValue items need to be sorted.
  • The sort key is DescribedObject.Name, which uniquely identifies a metric
    target within a namespace.
  • No existing test file for this package; the change is a single sort.Slice
    call with no logic branching.

Does this PR introduce a user-facing change?:

`karmada-metrics-adapter`: Fixed the issue that `GetMetricBySelector` returned
custom metrics in non-deterministic order when aggregating results from multiple
member clusters.

Copilot AI review requested due to automatic review settings June 15, 2026 18:34
@karmada-bot karmada-bot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 15, 2026
@gemini-code-assist

Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses non-deterministic behavior in the custom metrics adapter by ensuring that metric lists are sorted before being returned. Additionally, it improves the robustness of the cluster discovery logic by replacing a fatal error with proper error propagation.

Highlights

  • Deterministic Metric Ordering: Implemented sorting for the GetMetricBySelector results using DescribedObject.Name to ensure consistent, predictable output across API calls.
  • Error Handling Improvement: Updated joinClusterAPICluster to use klog.Errorf and return an error instead of calling klog.Fatalf, preventing unnecessary process termination.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@karmada-bot karmada-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jun 15, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR improves determinism and resilience by sorting returned custom metric values and avoiding a process exit when cluster-api kubeconfig loading fails.

Changes:

  • Sort MetricValueList.Items by described object name to return deterministic results.
  • Replace klog.Fatalf with error logging + returning the error when management cluster REST config cannot be built.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
pkg/metricsadapter/provider/custommetrics.go Adds sorting for stable ordering of returned metric values.
pkg/clusterdiscovery/clusterapi/clusterapi.go Avoids hard process termination on REST config creation failure by returning an error.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +158 to +160
sort.Slice(metricValueList.Items, func(i, j int) bool {
return metricValueList.Items[i].DescribedObject.Name < metricValueList.Items[j].DescribedObject.Name
})
Comment on lines +210 to +211
klog.Errorf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err)
return err

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces a fatal log with an error return in the cluster discovery logic and implements sorting for custom metrics by their described object's name. The reviewer suggested optimizing the metrics sorting implementation by checking for an empty slice early and pre-allocating the slice capacity to prevent unnecessary allocations.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines 155 to 163
for _, metric := range sameMetrics {
metricValueList.Items = append(metricValueList.Items, metric)
}
sort.Slice(metricValueList.Items, func(i, j int) bool {
return metricValueList.Items[i].DescribedObject.Name < metricValueList.Items[j].DescribedObject.Name
})
if len(metricValueList.Items) == 0 {
return nil, provider.NewMetricNotFoundError(info.GroupResource, info.Metric)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

We can optimize this by checking if sameMetrics is empty first to return early. This avoids unnecessary slice allocation and sorting. Additionally, we can pre-allocate metricValueList.Items with the exact capacity needed (len(sameMetrics)) to prevent multiple slice reallocations during the loop.

Suggested change
for _, metric := range sameMetrics {
metricValueList.Items = append(metricValueList.Items, metric)
}
sort.Slice(metricValueList.Items, func(i, j int) bool {
return metricValueList.Items[i].DescribedObject.Name < metricValueList.Items[j].DescribedObject.Name
})
if len(metricValueList.Items) == 0 {
return nil, provider.NewMetricNotFoundError(info.GroupResource, info.Metric)
}
if len(sameMetrics) == 0 {
return nil, provider.NewMetricNotFoundError(info.GroupResource, info.Metric)
}
metricValueList.Items = make([]custom_metrics.MetricValue, 0, len(sameMetrics))
for _, metric := range sameMetrics {
metricValueList.Items = append(metricValueList.Items, metric)
}
sort.Slice(metricValueList.Items, func(i, j int) bool {
return metricValueList.Items[i].DescribedObject.Name < metricValueList.Items[j].DescribedObject.Name
})

@codecov-commenter

codecov-commenter commented Jun 15, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 42.04%. Comparing base (e88a128) to head (f44c0ed).
⚠️ Report is 31 commits behind head on master.

Files with missing lines Patch % Lines
pkg/metricsadapter/provider/custommetrics.go 0.00% 3 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7633      +/-   ##
==========================================
- Coverage   42.16%   42.04%   -0.12%     
==========================================
  Files         879      879              
  Lines       54677    54830     +153     
==========================================
- Hits        23052    23051       -1     
- Misses      29880    30032     +152     
- Partials     1745     1747       +2     
Flag Coverage Δ
unittests 42.04% <0.00%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zhzhuang-zju

Copy link
Copy Markdown
Contributor

thanks
/assign

@zhzhuang-zju zhzhuang-zju left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ManojLamani. I’m unsure whether this PR qualifies as a bugfix. Could you provide specific scenarios where functionality fails without this change?

Also, ask @jabellard to take a look; he is an expert on metrics.

Comment on lines +210 to +211
klog.Errorf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err)
return err

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems unrelated to the main scope of this PR. I suggest submitting it as a separate PR.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, I will remove this change from the current PR and submit it as a separate PR. Thank you!

@ManojLamani

ManojLamani commented Jun 17, 2026

Copy link
Copy Markdown
Author

Thank you for the clarification @zhzhuang-zju.

You are right, this does not cause a hard functional failure. The core issue is that Go map iteration is non-deterministic, so GetMetricBySelector returns metrics in a different order on every call. While most HPA controllers iterate all items rather than relying on specific positions, an unstable API response makes the system harder to test and debug reliably. This is why the original author @chaunceyjiang left the TODO explicitly asking for sorting. And I've updated the kind bug to /kind cleanup to better reflect the nature of this change.

Thank you for your guidance.

Sure, @jabellard would appreciate your expertise on this. Please take a look when you get a chance!

…tput

Go map iteration order is non-deterministic. After aggregating metrics
from member clusters into a map, iterating it to build the response
Items slice produced a different ordering on every call.

Sort the Items slice by DescribedObject.Name before returning so that
callers always receive a stable, predictable list.

Signed-off-by: Manoj Lamani <manoj.p24@medhaviskillsuniversity.edu.in>
@ManojLamani ManojLamani force-pushed the fix-custommetrics-nondeterministic-order branch from 254ae1f to f44c0ed Compare June 17, 2026 04:51
@karmada-bot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from zhzhuang-zju. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zhzhuang-zju

Copy link
Copy Markdown
Contributor

And I've updated the kind bug to /kind cleanup to better reflect the nature of this change.

It doesn't seem to take effect. Let me help with this.
/kind-remove bug
/kind cleanup

@karmada-bot karmada-bot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Jun 17, 2026
@zhzhuang-zju

Copy link
Copy Markdown
Contributor

/remove-kind bug

@karmada-bot karmada-bot removed the kind/bug Categorizes issue or PR as related to a bug. label Jun 17, 2026
@ManojLamani

Copy link
Copy Markdown
Author

Thank you @zhzhuang-zju for fixing the labels!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants