fix: queue allocated metric consistency for jobless queues in capacit…#5050
fix: queue allocated metric consistency for jobless queues in capacit…#5050Aliexe-code wants to merge 1 commit intovolcano-sh:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @Aliexe-code! It looks like this is your first PR to volcano-sh/volcano 🎉 |
Summary of ChangesHello @Aliexe-code, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a bug in the capacity scheduler plugin that led to inconsistent allocated resource metrics for jobless queues, such as the root queue, when queue hierarchy was not enabled. Previously, these queues would incorrectly show zero allocated resources. The changes ensure that the "Queue.Status.Allocated" value is correctly used for these queues, providing accurate resource reporting and improving the reliability of resource management within the scheduler. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request correctly addresses the issue where jobless queues were missing from metrics when hierarchy is disabled. By ensuring all queues are populated in queueOpts using their status as a fallback, it provides consistent reporting. The simplification of the metrics recording logic is also a welcome improvement. I have identified one maintainability issue regarding code duplication.
| for queueID, queueInfo := range ssn.Queues { | ||
| if _, exists := cp.queueOpts[queueID]; !exists { | ||
| attr := &queueAttr{ | ||
| queueID: queueInfo.UID, | ||
| name: queueInfo.Name, | ||
| deserved: api.NewResource(queueInfo.Queue.Spec.Deserved), | ||
| allocated: api.EmptyResource(), | ||
| request: api.EmptyResource(), | ||
| elastic: api.EmptyResource(), | ||
| inqueue: api.EmptyResource(), | ||
| guarantee: api.EmptyResource(), | ||
| } | ||
| if len(queueInfo.Queue.Spec.Capability) != 0 { | ||
| attr.capability = api.NewResource(queueInfo.Queue.Spec.Capability) | ||
| if attr.capability.MilliCPU <= 0 { | ||
| attr.capability.MilliCPU = math.MaxFloat64 | ||
| } | ||
| if attr.capability.Memory <= 0 { | ||
| attr.capability.Memory = math.MaxFloat64 | ||
| } | ||
| } | ||
| if len(queueInfo.Queue.Spec.Guarantee.Resource) != 0 { | ||
| attr.guarantee = api.NewResource(queueInfo.Queue.Spec.Guarantee.Resource) | ||
| } | ||
| realCapability := api.ExceededPart(cp.totalResource, cp.totalGuarantee).Add(attr.guarantee) | ||
| if attr.capability == nil { | ||
| attr.capability = api.EmptyResource() | ||
| attr.realCapability = realCapability | ||
| } else { | ||
| realCapability.MinDimensionResource(attr.capability, api.Infinity) | ||
| attr.realCapability = realCapability | ||
| } | ||
| // Use Queue.Status.Allocated for jobless queues to ensure metric consistency | ||
| if queueInfo.Queue.Status.Allocated != nil { | ||
| attr.allocated = api.NewResource(queueInfo.Queue.Status.Allocated) | ||
| } | ||
| cp.queueOpts[queueID] = attr | ||
| klog.V(4).Infof("Added jobless Queue <%s> attributes with allocated from status: <%v>", queueInfo.Name, attr.allocated) | ||
| } | ||
| } |
There was a problem hiding this comment.
The logic for initializing queueAttr (calculating deserved, capability, guarantee, and realCapability) is almost identical to the logic used in the previous loop (lines 418-448). This duplication makes the code harder to maintain and prone to drift if the initialization logic changes in the future. Consider refactoring this common initialization logic into a helper method that can be used in both places.
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug where jobless queues (such as the root queue) had their allocated metrics incorrectly reported as zero when hierarchy was disabled, instead of using the actual Queue.Status.Allocated value. The fix ensures that all queues in the session are added to the capacity plugin's internal queueOpts map, with jobless queues using their Status.Allocated values for consistency with the Queue CRD status.
Changes:
- Added logic to populate
queueOptsfor jobless queues usingQueue.Status.Allocatedwhen hierarchy is disabled - Simplified metrics recording to iterate over
queueOptsinstead of handling jobless queues separately - Added test case to verify jobless queue handling
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| pkg/scheduler/plugins/capacity/capacity.go | Added logic to populate queueOpts for jobless queues using Status.Allocated (lines 487-528), simplified metrics recording to handle all queues uniformly (lines 541-550) |
| pkg/scheduler/plugins/capacity/capacity_test.go | Added TestBuildQueueAttrsMetricConsistency to verify jobless queue allocated values are read from Status.Allocated |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| if actualAllocated.Memory != expectedAllocated.Memory { | ||
| t.Errorf("Root queue allocated Memory: got %v, want %v", actualAllocated.Memory, expectedAllocated.Memory) | ||
| } |
There was a problem hiding this comment.
The test verifies that rootQueue.Queue.Status.Allocated contains the expected values, but this only confirms the test setup, not the fix itself. The actual fix ensures that jobless queues are added to queueOpts with their allocated values from Status.Allocated, which would then be used for metrics reporting. However, the test doesn't verify that the capacity plugin's queueOpts was populated correctly for the jobless queue, or that metrics would be reported with the correct values. Consider accessing the plugin instance to verify queueOpts contains an entry for the root queue with the correct allocated value, or trigger and verify metric updates.
| } | |
| } | |
| // Additionally verify that the capacity plugin populated queueOpts for the jobless root queue | |
| pluginIface, ok := ssn.Plugins[PluginName] | |
| if !ok { | |
| t.Fatal("capacity plugin should be registered in the session") | |
| } | |
| capacityPlugin, ok := pluginIface.(*CapacityPlugin) | |
| if !ok { | |
| t.Fatalf("plugin registered under %q is not *CapacityPlugin", PluginName) | |
| } | |
| rootQueueOpt, ok := capacityPlugin.queueOpts[rootQueueID] | |
| if !ok { | |
| t.Fatalf("queueOpts should contain an entry for root queue %q", rootQueueID) | |
| } | |
| if rootQueueOpt.allocated == nil { | |
| t.Fatal("queueOpts entry for root queue should have allocated resources set") | |
| } | |
| // queueOpts for the jobless root queue must reflect Status.Allocated | |
| pluginAllocated := rootQueueOpt.allocated | |
| if pluginAllocated.MilliCPU != expectedAllocated.MilliCPU { | |
| t.Errorf("queueOpts allocated CPU for root queue: got %v, want %v", pluginAllocated.MilliCPU, expectedAllocated.MilliCPU) | |
| } | |
| if pluginAllocated.Memory != expectedAllocated.Memory { | |
| t.Errorf("queueOpts allocated Memory for root queue: got %v, want %v", pluginAllocated.Memory, expectedAllocated.Memory) | |
| } |
| attr := &queueAttr{ | ||
| queueID: queueInfo.UID, | ||
| name: queueInfo.Name, | ||
| deserved: api.NewResource(queueInfo.Queue.Spec.Deserved), | ||
| allocated: api.EmptyResource(), | ||
| request: api.EmptyResource(), | ||
| elastic: api.EmptyResource(), | ||
| inqueue: api.EmptyResource(), | ||
| guarantee: api.EmptyResource(), | ||
| } | ||
| if len(queueInfo.Queue.Spec.Capability) != 0 { | ||
| attr.capability = api.NewResource(queueInfo.Queue.Spec.Capability) | ||
| if attr.capability.MilliCPU <= 0 { | ||
| attr.capability.MilliCPU = math.MaxFloat64 | ||
| } | ||
| if attr.capability.Memory <= 0 { | ||
| attr.capability.Memory = math.MaxFloat64 | ||
| } | ||
| } | ||
| if len(queueInfo.Queue.Spec.Guarantee.Resource) != 0 { | ||
| attr.guarantee = api.NewResource(queueInfo.Queue.Spec.Guarantee.Resource) | ||
| } | ||
| realCapability := api.ExceededPart(cp.totalResource, cp.totalGuarantee).Add(attr.guarantee) | ||
| if attr.capability == nil { | ||
| attr.capability = api.EmptyResource() | ||
| attr.realCapability = realCapability | ||
| } else { | ||
| realCapability.MinDimensionResource(attr.capability, api.Infinity) | ||
| attr.realCapability = realCapability | ||
| } |
There was a problem hiding this comment.
The code for initializing queue attributes here (lines 491-520) is nearly identical to the code for job-based queues (lines 418-448). Consider extracting this logic into a helper function to reduce code duplication and improve maintainability. The function could accept a queueInfo parameter and return a queueAttr, handling the capability, guarantee, and realCapability initialization in one place.
…y plugin When hierarchy is disabled, jobless queues (like root) were not added to queueOpts, causing their metrics to report zero allocated instead of using Queue.Status.Allocated. This fix: - Adds jobless queues to queueOpts with allocated from Status.Allocated - Removes dead code in metrics recording (else branch now unreachable) - Refactors queueAttr initialization into helper method to avoid duplication - Adds test to verify jobless queue allocated consistency Related: volcano-sh#5048 (Bug 1) Signed-off-by: Aliexe-code <terminalexe1@gmail.com>
6d220dd to
d1f90fd
Compare
/kind bug
What this PR does / why we need it:
Fixes a bug where jobless queues (like root) had their allocated metrics incorrectly
reported as zero when hierarchy was disabled, instead of using the actual
Queue.Status.Allocatedvalue.Which issue(s) this PR fixes:
Related to #5048 (Bug 1)
Special notes for your reviewer:
None
Does this PR introduce a user-facing change?