Skip to content

Comments

fix: queue allocated metric consistency for jobless queues in capacit…#5050

Open
Aliexe-code wants to merge 1 commit intovolcano-sh:masterfrom
Aliexe-code:fix/capacity-plugin-metric-consistency
Open

fix: queue allocated metric consistency for jobless queues in capacit…#5050
Aliexe-code wants to merge 1 commit intovolcano-sh:masterfrom
Aliexe-code:fix/capacity-plugin-metric-consistency

Conversation

@Aliexe-code
Copy link

/kind bug

What this PR does / why we need it:
Fixes a bug where jobless queues (like root) had their allocated metrics incorrectly
reported as zero when hierarchy was disabled, instead of using the actual
Queue.Status.Allocated value.

Which issue(s) this PR fixes:
Related to #5048 (Bug 1)

Special notes for your reviewer:
None

Does this PR introduce a user-facing change?

Fixed queue allocated metric consistency for jobless queues in capacity plugin when
hierarchy is disabled

Copilot AI review requested due to automatic review settings February 20, 2026 15:17
@volcano-sh-bot volcano-sh-bot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 20, 2026
@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign k82cn for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot
Copy link
Contributor

Welcome @Aliexe-code! It looks like this is your first PR to volcano-sh/volcano 🎉

@gemini-code-assist
Copy link

Summary of Changes

Hello @Aliexe-code, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a bug in the capacity scheduler plugin that led to inconsistent allocated resource metrics for jobless queues, such as the root queue, when queue hierarchy was not enabled. Previously, these queues would incorrectly show zero allocated resources. The changes ensure that the "Queue.Status.Allocated" value is correctly used for these queues, providing accurate resource reporting and improving the reliability of resource management within the scheduler.

Highlights

  • Metric Consistency: Corrected a bug in the capacity plugin where jobless queues (like the root queue) incorrectly reported zero allocated resources when queue hierarchy was disabled.
  • Resource Allocation Accuracy: Ensured that jobless queues now accurately reflect their "Queue.Status.Allocated" resources in metrics, improving resource visibility.
  • Test Coverage: Introduced a new test case, "TestBuildQueueAttrsMetricConsistency", to validate the fix and ensure proper allocated metric reporting for jobless queues.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • pkg/scheduler/plugins/capacity/capacity.go
    • Introduced a new loop to identify and populate "queueAttr" for jobless queues (those not already processed) using their "Queue.Status.Allocated" value, ensuring they are included in metric calculations.
    • Refactored the metric recording section to iterate directly over the "cp.queueOpts" map, simplifying the logic as all relevant queues are now guaranteed to be present in this map.
  • pkg/scheduler/plugins/capacity/capacity_test.go
    • Added "TestBuildQueueAttrsMetricConsistency", a new unit test that specifically verifies the fix by simulating a scenario with a jobless root queue and asserting that its allocated resources are correctly reported from "Queue.Status.Allocated" when hierarchy is disabled.
Activity
  • No specific activity (comments, reviews, etc.) was recorded for this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 20, 2026
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request correctly addresses the issue where jobless queues were missing from metrics when hierarchy is disabled. By ensuring all queues are populated in queueOpts using their status as a fallback, it provides consistent reporting. The simplification of the metrics recording logic is also a welcome improvement. I have identified one maintainability issue regarding code duplication.

Comment on lines 489 to 528
for queueID, queueInfo := range ssn.Queues {
if _, exists := cp.queueOpts[queueID]; !exists {
attr := &queueAttr{
queueID: queueInfo.UID,
name: queueInfo.Name,
deserved: api.NewResource(queueInfo.Queue.Spec.Deserved),
allocated: api.EmptyResource(),
request: api.EmptyResource(),
elastic: api.EmptyResource(),
inqueue: api.EmptyResource(),
guarantee: api.EmptyResource(),
}
if len(queueInfo.Queue.Spec.Capability) != 0 {
attr.capability = api.NewResource(queueInfo.Queue.Spec.Capability)
if attr.capability.MilliCPU <= 0 {
attr.capability.MilliCPU = math.MaxFloat64
}
if attr.capability.Memory <= 0 {
attr.capability.Memory = math.MaxFloat64
}
}
if len(queueInfo.Queue.Spec.Guarantee.Resource) != 0 {
attr.guarantee = api.NewResource(queueInfo.Queue.Spec.Guarantee.Resource)
}
realCapability := api.ExceededPart(cp.totalResource, cp.totalGuarantee).Add(attr.guarantee)
if attr.capability == nil {
attr.capability = api.EmptyResource()
attr.realCapability = realCapability
} else {
realCapability.MinDimensionResource(attr.capability, api.Infinity)
attr.realCapability = realCapability
}
// Use Queue.Status.Allocated for jobless queues to ensure metric consistency
if queueInfo.Queue.Status.Allocated != nil {
attr.allocated = api.NewResource(queueInfo.Queue.Status.Allocated)
}
cp.queueOpts[queueID] = attr
klog.V(4).Infof("Added jobless Queue <%s> attributes with allocated from status: <%v>", queueInfo.Name, attr.allocated)
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for initializing queueAttr (calculating deserved, capability, guarantee, and realCapability) is almost identical to the logic used in the previous loop (lines 418-448). This duplication makes the code harder to maintain and prone to drift if the initialization logic changes in the future. Consider refactoring this common initialization logic into a helper method that can be used in both places.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where jobless queues (such as the root queue) had their allocated metrics incorrectly reported as zero when hierarchy was disabled, instead of using the actual Queue.Status.Allocated value. The fix ensures that all queues in the session are added to the capacity plugin's internal queueOpts map, with jobless queues using their Status.Allocated values for consistency with the Queue CRD status.

Changes:

  • Added logic to populate queueOpts for jobless queues using Queue.Status.Allocated when hierarchy is disabled
  • Simplified metrics recording to iterate over queueOpts instead of handling jobless queues separately
  • Added test case to verify jobless queue handling

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
pkg/scheduler/plugins/capacity/capacity.go Added logic to populate queueOpts for jobless queues using Status.Allocated (lines 487-528), simplified metrics recording to handle all queues uniformly (lines 541-550)
pkg/scheduler/plugins/capacity/capacity_test.go Added TestBuildQueueAttrsMetricConsistency to verify jobless queue allocated values are read from Status.Allocated

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}
if actualAllocated.Memory != expectedAllocated.Memory {
t.Errorf("Root queue allocated Memory: got %v, want %v", actualAllocated.Memory, expectedAllocated.Memory)
}
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test verifies that rootQueue.Queue.Status.Allocated contains the expected values, but this only confirms the test setup, not the fix itself. The actual fix ensures that jobless queues are added to queueOpts with their allocated values from Status.Allocated, which would then be used for metrics reporting. However, the test doesn't verify that the capacity plugin's queueOpts was populated correctly for the jobless queue, or that metrics would be reported with the correct values. Consider accessing the plugin instance to verify queueOpts contains an entry for the root queue with the correct allocated value, or trigger and verify metric updates.

Suggested change
}
}
// Additionally verify that the capacity plugin populated queueOpts for the jobless root queue
pluginIface, ok := ssn.Plugins[PluginName]
if !ok {
t.Fatal("capacity plugin should be registered in the session")
}
capacityPlugin, ok := pluginIface.(*CapacityPlugin)
if !ok {
t.Fatalf("plugin registered under %q is not *CapacityPlugin", PluginName)
}
rootQueueOpt, ok := capacityPlugin.queueOpts[rootQueueID]
if !ok {
t.Fatalf("queueOpts should contain an entry for root queue %q", rootQueueID)
}
if rootQueueOpt.allocated == nil {
t.Fatal("queueOpts entry for root queue should have allocated resources set")
}
// queueOpts for the jobless root queue must reflect Status.Allocated
pluginAllocated := rootQueueOpt.allocated
if pluginAllocated.MilliCPU != expectedAllocated.MilliCPU {
t.Errorf("queueOpts allocated CPU for root queue: got %v, want %v", pluginAllocated.MilliCPU, expectedAllocated.MilliCPU)
}
if pluginAllocated.Memory != expectedAllocated.Memory {
t.Errorf("queueOpts allocated Memory for root queue: got %v, want %v", pluginAllocated.Memory, expectedAllocated.Memory)
}

Copilot uses AI. Check for mistakes.
Comment on lines 491 to 520
attr := &queueAttr{
queueID: queueInfo.UID,
name: queueInfo.Name,
deserved: api.NewResource(queueInfo.Queue.Spec.Deserved),
allocated: api.EmptyResource(),
request: api.EmptyResource(),
elastic: api.EmptyResource(),
inqueue: api.EmptyResource(),
guarantee: api.EmptyResource(),
}
if len(queueInfo.Queue.Spec.Capability) != 0 {
attr.capability = api.NewResource(queueInfo.Queue.Spec.Capability)
if attr.capability.MilliCPU <= 0 {
attr.capability.MilliCPU = math.MaxFloat64
}
if attr.capability.Memory <= 0 {
attr.capability.Memory = math.MaxFloat64
}
}
if len(queueInfo.Queue.Spec.Guarantee.Resource) != 0 {
attr.guarantee = api.NewResource(queueInfo.Queue.Spec.Guarantee.Resource)
}
realCapability := api.ExceededPart(cp.totalResource, cp.totalGuarantee).Add(attr.guarantee)
if attr.capability == nil {
attr.capability = api.EmptyResource()
attr.realCapability = realCapability
} else {
realCapability.MinDimensionResource(attr.capability, api.Infinity)
attr.realCapability = realCapability
}
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code for initializing queue attributes here (lines 491-520) is nearly identical to the code for job-based queues (lines 418-448). Consider extracting this logic into a helper function to reduce code duplication and improve maintainability. The function could accept a queueInfo parameter and return a queueAttr, handling the capability, guarantee, and realCapability initialization in one place.

Copilot uses AI. Check for mistakes.
…y plugin

When hierarchy is disabled, jobless queues (like root) were not added
to queueOpts, causing their metrics to report zero allocated instead
of using Queue.Status.Allocated.

This fix:
- Adds jobless queues to queueOpts with allocated from Status.Allocated
- Removes dead code in metrics recording (else branch now unreachable)
- Refactors queueAttr initialization into helper method to avoid duplication
- Adds test to verify jobless queue allocated consistency

Related: volcano-sh#5048 (Bug 1)
Signed-off-by: Aliexe-code <terminalexe1@gmail.com>
@Aliexe-code Aliexe-code force-pushed the fix/capacity-plugin-metric-consistency branch from 6d220dd to d1f90fd Compare February 20, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants