capacity: compute queue overused metric from final share instead of preemptiveFn by aadhil2k4 · Pull Request #5051 · volcano-sh/volcano

aadhil2k4 · 2026-02-21T06:50:50Z

What type of PR is this?

bug fix

What this PR does / why we need it:

The queue overused metric (UpdateQueueOverused) was previously updated inside AddPreemptiveFn, which runs per-task-per-queue during preemption. This caused the metric to be overwritten multiple times within a single scheduling cycle and reflect task-level preemption eligibility rather than actual queue overuse.

Additionally, the previous logic derived the metric from !isPreemptive, which semantically represented "cannot reclaim" instead of whether the queue exceeded its deserved resources.

Which issue(s) this PR fixes:

Fixes issue: 5048 - bug2

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Moves UpdateQueuOverUsed from AddPreemptiveFn to OnSessionClose. Computes queue overused based on final queue share instead of task level preemption result. No scheduling behavior changed. Signed-off-by: Aadhil Ahamed <aadhil2k4@gmail.com>

gemini-code-assist · 2026-02-21T06:51:03Z

Summary of Changes

Hello @aadhil2k4, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a bug in the capacity scheduler plugin where the queue overuse metric was being incorrectly calculated and updated. The metric's update mechanism has been refactored to ensure it accurately reflects a queue's resource consumption relative to its deserved share, providing a more reliable indicator of system state.

Highlights

Metric Update Relocation: The UpdateQueueOverused metric call has been moved from the AddPreemptiveFn function to the OnSessionClose function. This ensures the metric is updated once per queue at the end of a scheduling session, rather than multiple times per task during preemption.
Overused Metric Logic Change: The logic for determining if a queue is 'overused' has been updated. Previously, it was derived from !isPreemptive, which indicated whether a task could reclaim resources. Now, it is calculated as attr.share > 1, directly reflecting whether the queue's final share exceeds its deserved resources.
Bug Fix: This change addresses a bug where the queue overused metric was being overwritten multiple times within a single scheduling cycle, leading to inaccurate reporting of queue overuse.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

pkg/scheduler/plugins/capacity/capacity.go
- Removed the overused calculation and metrics.UpdateQueueOverused call from the AddPreemptiveFn function.
- Added a loop within the OnSessionClose function to iterate through all queues and update their overused metric.
- Modified the overused metric calculation to use attr.share > 1 for a more accurate representation of queue overuse.

Activity

The pull request was created to address issue Multiple good-first-issue bugs in capacity plugin, backfill action, cache session lifecycle, and e2e test context dumping #5048, specifically bug2, related to incorrect queue overused metric calculation.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

aadhil2k4 · 2026-02-21T06:53:03Z

cc: @hajnalmt Please review this PR and let me know if the approach is correct?

Copilot

Pull request overview

This PR fixes how the queue overused metric is computed in the capacity plugin. Previously, the metric was updated inside AddPreemptiveFn, which is called per-task-per-queue during preemption evaluation, causing the metric to be overwritten multiple times within a single scheduling cycle. The metric also used the wrong semantic (!isPreemptive representing "cannot reclaim" rather than "overused").

The fix moves the metric update to OnSessionClose, where it's computed once per queue at the end of each scheduling session, and changes the logic to use attr.share > 1, which correctly represents whether a queue is using more than its deserved resources.

Changes:

Removed metric update from AddPreemptiveFn to prevent multiple overwrites per scheduling cycle
Added metric update in OnSessionClose to compute final overused state based on attr.share > 1
Changed semantic from task-level "cannot reclaim" to queue-level "exceeds deserved resources"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-21T06:54:22Z

pkg/scheduler/plugins/capacity/capacity.go

+	for _, attr := range cp.queueOpts {
+		overused := attr.share > 1
+		metrics.UpdateQueueOverused(attr.name, overused)
+	}


When hierarchical queues are enabled, ancestor queues have their allocated resources updated in AllocateFunc/DeallocateFunc (lines 368 and 385) but their share is never updated via updateShare(). This means ancestor queues will have outdated share values when the overused metric is computed here. The metric will not accurately reflect whether ancestor queues are overused.

To fix this, after updating ancestor allocated resources in the AllocateFunc and DeallocateFunc event handlers, you should also call cp.updateShare(ancestorAttr) for each ancestor queue.

This is valid but it's a separate issue I think and it's not a too important bug as we are not really intrested in ancestor queue shares only the leaf ones during scheduling.

gemini-code-assist

Code Review

This pull request correctly fixes a bug in how the queue_overused metric is calculated for the capacity plugin. Previously, the metric was updated inside AddPreemptiveFn on a per-task basis, leading to incorrect and overwritten values. The logic also incorrectly inferred overuse from preemption eligibility. By moving the metric update to OnSessionClose and calculating overuse based on the final queue share (attr.share > 1), the change ensures the metric accurately reflects whether a queue has exceeded its deserved resources at the end of a scheduling cycle. The implementation is clean and addresses the issue effectively.

hajnalmt

/lgtm
/priority low
/area scheduling

Thanks!

hajnalmt · 2026-02-21T08:13:39Z

pkg/scheduler/plugins/capacity/capacity.go

+	for _, attr := range cp.queueOpts {
+		overused := attr.share > 1
+		metrics.UpdateQueueOverused(attr.name, overused)
+	}


This is valid but it's a separate issue I think and it's not a too important bug as we are not really intrested in ancestor queue shares only the leaf ones during scheduling.

volcano-sh-bot · 2026-02-21T08:20:23Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hajnalmt
Once this PR has been reviewed and has the lgtm label, please assign archlitchi for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

pkg/scheduler/plugins/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copilot AI review requested due to automatic review settings February 21, 2026 06:50

volcano-sh-bot requested review from archlitchi and hajnalmt February 21, 2026 06:50

volcano-sh-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Feb 21, 2026

Copilot started reviewing on behalf of aadhil2k4 February 21, 2026 06:51 View session

Copilot AI reviewed Feb 21, 2026

View reviewed changes

gemini-code-assist bot reviewed Feb 21, 2026

View reviewed changes

hajnalmt approved these changes Feb 21, 2026

View reviewed changes

volcano-sh-bot assigned hajnalmt Feb 21, 2026

volcano-sh-bot added priority/low area/scheduling lgtm Indicates that a PR is ready to be merged. labels Feb 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

capacity: compute queue overused metric from final share instead of preemptiveFn#5051

capacity: compute queue overused metric from final share instead of preemptiveFn#5051
aadhil2k4 wants to merge 1 commit intovolcano-sh:masterfrom
aadhil2k4:capacityPlugin

aadhil2k4 commented Feb 21, 2026

Uh oh!

gemini-code-assist bot commented Feb 21, 2026

Uh oh!

aadhil2k4 commented Feb 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 21, 2026

Uh oh!

hajnalmt Feb 21, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

hajnalmt left a comment

Uh oh!

hajnalmt Feb 21, 2026 •

edited

Loading

Uh oh!

volcano-sh-bot commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

aadhil2k4 commented Feb 21, 2026

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

gemini-code-assist bot commented Feb 21, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

aadhil2k4 commented Feb 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

hajnalmt Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

hajnalmt left a comment

Choose a reason for hiding this comment

Uh oh!

hajnalmt Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

volcano-sh-bot commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hajnalmt Feb 21, 2026 •

edited

Loading

hajnalmt Feb 21, 2026 •

edited

Loading