Fix metricsadapter: sort GetMetricBySelector results for deterministic output#7633
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses non-deterministic behavior in the custom metrics adapter by ensuring that metric lists are sorted before being returned. Additionally, it improves the robustness of the cluster discovery logic by replacing a fatal error with proper error propagation. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR improves determinism and resilience by sorting returned custom metric values and avoiding a process exit when cluster-api kubeconfig loading fails.
Changes:
- Sort
MetricValueList.Itemsby described object name to return deterministic results. - Replace
klog.Fatalfwith error logging + returning the error when management cluster REST config cannot be built.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| pkg/metricsadapter/provider/custommetrics.go | Adds sorting for stable ordering of returned metric values. |
| pkg/clusterdiscovery/clusterapi/clusterapi.go | Avoids hard process termination on REST config creation failure by returning an error. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| sort.Slice(metricValueList.Items, func(i, j int) bool { | ||
| return metricValueList.Items[i].DescribedObject.Name < metricValueList.Items[j].DescribedObject.Name | ||
| }) |
| klog.Errorf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err) | ||
| return err |
There was a problem hiding this comment.
Code Review
This pull request replaces a fatal log with an error return in the cluster discovery logic and implements sorting for custom metrics by their described object's name. The reviewer suggested optimizing the metrics sorting implementation by checking for an empty slice early and pre-allocating the slice capacity to prevent unnecessary allocations.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| for _, metric := range sameMetrics { | ||
| metricValueList.Items = append(metricValueList.Items, metric) | ||
| } | ||
| sort.Slice(metricValueList.Items, func(i, j int) bool { | ||
| return metricValueList.Items[i].DescribedObject.Name < metricValueList.Items[j].DescribedObject.Name | ||
| }) | ||
| if len(metricValueList.Items) == 0 { | ||
| return nil, provider.NewMetricNotFoundError(info.GroupResource, info.Metric) | ||
| } |
There was a problem hiding this comment.
We can optimize this by checking if sameMetrics is empty first to return early. This avoids unnecessary slice allocation and sorting. Additionally, we can pre-allocate metricValueList.Items with the exact capacity needed (len(sameMetrics)) to prevent multiple slice reallocations during the loop.
| for _, metric := range sameMetrics { | |
| metricValueList.Items = append(metricValueList.Items, metric) | |
| } | |
| sort.Slice(metricValueList.Items, func(i, j int) bool { | |
| return metricValueList.Items[i].DescribedObject.Name < metricValueList.Items[j].DescribedObject.Name | |
| }) | |
| if len(metricValueList.Items) == 0 { | |
| return nil, provider.NewMetricNotFoundError(info.GroupResource, info.Metric) | |
| } | |
| if len(sameMetrics) == 0 { | |
| return nil, provider.NewMetricNotFoundError(info.GroupResource, info.Metric) | |
| } | |
| metricValueList.Items = make([]custom_metrics.MetricValue, 0, len(sameMetrics)) | |
| for _, metric := range sameMetrics { | |
| metricValueList.Items = append(metricValueList.Items, metric) | |
| } | |
| sort.Slice(metricValueList.Items, func(i, j int) bool { | |
| return metricValueList.Items[i].DescribedObject.Name < metricValueList.Items[j].DescribedObject.Name | |
| }) |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7633 +/- ##
==========================================
- Coverage 42.16% 42.04% -0.12%
==========================================
Files 879 879
Lines 54677 54830 +153
==========================================
- Hits 23052 23051 -1
- Misses 29880 30032 +152
- Partials 1745 1747 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
thanks |
zhzhuang-zju
left a comment
There was a problem hiding this comment.
Thanks @ManojLamani. I’m unsure whether this PR qualifies as a bugfix. Could you provide specific scenarios where functionality fails without this change?
Also, ask @jabellard to take a look; he is an expert on metrics.
| klog.Errorf("Failed to get cluster-api management cluster rest config. kubeconfig: %s, err: %v", kubeconfigPath, err) | ||
| return err |
There was a problem hiding this comment.
This change seems unrelated to the main scope of this PR. I suggest submitting it as a separate PR.
There was a problem hiding this comment.
Noted, I will remove this change from the current PR and submit it as a separate PR. Thank you!
|
Thank you for the clarification @zhzhuang-zju. You are right, this does not cause a hard functional failure. The core issue is that Go map iteration is non-deterministic, so GetMetricBySelector returns metrics in a different order on every call. While most HPA controllers iterate all items rather than relying on specific positions, an unstable API response makes the system harder to test and debug reliably. This is why the original author @chaunceyjiang left the TODO explicitly asking for sorting. And I've updated the kind bug to /kind cleanup to better reflect the nature of this change. Thank you for your guidance. Sure, @jabellard would appreciate your expertise on this. Please take a look when you get a chance! |
…tput Go map iteration order is non-deterministic. After aggregating metrics from member clusters into a map, iterating it to build the response Items slice produced a different ordering on every call. Sort the Items slice by DescribedObject.Name before returning so that callers always receive a stable, predictable list. Signed-off-by: Manoj Lamani <manoj.p24@medhaviskillsuniversity.edu.in>
254ae1f to
f44c0ed
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
It doesn't seem to take effect. Let me help with this. |
|
/remove-kind bug |
|
Thank you @zhzhuang-zju for fixing the labels! |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
GetMetricBySelectoraggregates custom metrics from member clusters into amap (
sameMetrics) and then iterates the map to build the response Items slice.Go map iteration order is non-deterministic, so every call returns metrics in a
different order. This can cause inconsistent behavior in HPA controllers that
consume the custom metrics API and compare metric lists across calls.
This PR sorts the Items slice by
DescribedObject.Namebefore returning,producing a stable and predictable response. Resolves the existing TODO left
by @chaunceyjiang.
Which issue(s) this PR fixes:
N/A
Special notes for your reviewer:
GetMetricBySelector:// TODO(chaunceyjiang) The MetricValue items need to be sorted.DescribedObject.Name, which uniquely identifies a metrictarget within a namespace.
sort.Slicecall with no logic branching.
Does this PR introduce a user-facing change?: