-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Add overwriteRootCapability argument to capacity plugin #4662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add overwriteRootCapability argument to capacity plugin #4662
Conversation
|
Welcome @mrendi29! |
Summary of ChangesHello @mrendi29, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the capacity scheduling plugin by providing a mechanism to override the root queue's resource limits. This allows administrators to define virtual resource capacities that can differ from the physical cluster resources, which is particularly useful in environments utilizing cluster autoscaling where virtual resources might exceed real resources. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new overwriteRootCapability argument to the capacity plugin, allowing administrators to set virtual resource limits for the root queue. The implementation is sound, but there's an opportunity to improve code robustness by cloning resource objects to prevent potential pointer aliasing issues. Additionally, the PR is missing unit tests for the new functionality. It's important to add tests to verify the behavior of the overwriteRootCapability flag and prevent future regressions.
|
This is not true. You can set it to any value. We only set it to the cluster capacity when it's empty. I am far from happy with the current queue design, and all these hacks around the root queue, but I feel that this is still a step away from the right path. |
|
Oh sry, I understand it now. We want to overwrite the realCapability with the root Queue's capability, maybe renaming it to overwriteRootQueueRealCapability would better reflect the intention and logic behind the usage of the argument. |
|
@hajnalmt thanks for the review. I'll update the variable to better reflect the intention |
|
/cc |
|
@mrendi29 Please sign your commit by using |
@mrendi29 Have you already tested it in production clusters and does it work properly? |
Not yet unfortunately, had to fight some fires at work. Planning to do so this week. Will also address the comments on the PR and include some tests. |
|
It looks like this would solve other issues too: |
I'll push my remaining changes in a bit and I'll give you access to the repo and we can pair on this together. Thanks for your help! |
3f47d5b to
f37fc4f
Compare
082ba19 to
d8e6bb6
Compare
|
@hajnalmt i've added you on the repo, you should have access now. I'll work on adding a testcase as well. if you can get to that before I do feel free to make the change. |
|
@JesseStutler @hajnalmt can you take a second look at this PR? I believe I have addressed all comments. Let me know what you think. FYI, the test case is AI generated after a couple of prompts :) |
| } | ||
| } | ||
|
|
||
| func TestOverwriteRootQueueRealCapability(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just add a simple case in Test_capacityPlugin_OnSessionOpenWithHierarchy is enough, there's no need to add a new Test_capacityPlugin_OnSessionOpenWithHierarchy test func and add a lot of new cases :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| rootQueueAttr.capability, cp.totalResource) | ||
| } else { | ||
| // Default behavior: use actual cluster resources | ||
| rootQueueAttr.realCapability = cp.totalResource.Clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be more appropriate to set the default value to infinity here?
The root queue is automatically created by Volcano, and users should not be aware of it. Therefore, when only the capacity plugin is enabled, users can submit an unlimited number of jobs, and the root queue does not perform any quota checks or restrictions. If users want to use the queue quota capability, they need to actively create queues for quota management.
The same logic applies to the default values for the Deserver and Guarantee of the root queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our team is running into the same issue described here: #4680, which also relates to root capacity checks. I agree that it would make sense for the root capacity to be set to infinity by default, rather than requiring users to specify a value.
I do have a question, though. Are there any concerns with allowing infinite capacity on the root, especially given that the original design of Volcano seemed to care about overall cluster capacity? For example, there’s a capacity plugin that rejects jobs when the cluster is full and an overcommit plugin to allow extra jobs. It feels like the earlier designs didn’t favor an infinite root capacity. Would this change have any unintended impact on those mechanisms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vzhou-p Looks like difference users have difference requirements, setting it as a infinite value is a good choice, but currrently child queues' capability will inherit from the parent queue if not set:

So which mean child queue also can submit infinite jobs, so I'm not sure that whether it's what users want, because difference users may have difference requirements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that the default value for the root queue may have different expectations depending on different scenarios:
1. Default to infinite: After enabling the capacity plugin, users do not need to create additional queues, and job scheduling behaves the same as when the capacity plugin is not enabled.
2. Default to the actual resource capacity of the cluster: After enabling the capacity plugin, the queue's quota management capability is automatically activated. Jobs exceeding the quota limit will not be scheduled.
3. Users can specify a certain percentage increase based on the actual resource capacity of the cluster, for example, 120% of the actual resources. This allows for some overcommitment of quotas based on user configuration.
4. Users can specify the actual resource capacity and actively manage it along with the queue quotas.
Can these four methods be exposed through arguments in the capability configuration? If no specific setting is provided, the infinite mode should be used by default. In scenarios where users do not create additional queues, the behavior should remain consistent with that when the capacity plugin is not enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting the root queue as infinity sounds like a really interesting option. That means that I as a platform admin would not need to interact with the root queue directly and can just create another queue on top of root.
(I recall that there was another PR somewhere that wanted to interact with the root queue capacity through the helm chart, this could eliminate that).
In the same time, having child queues submit infinite jobs seems really risky.
I wonder though, if it makes sense for this functionality to be part of overwriteRootQueueRealCapability in L610 instead of default Volcano behavior.
Depending on what the PMCs think I can then make the necessary adjustments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the same time, having child queues submit infinite jobs seems really risky.
I think the design of the Volcano queue is to allow Volcano users to act as admins, allowing them to specify the queue's capability value, not just submit jobs to the default queue. Therefore as a Volcano user, and I may use Volcano to build a platform, I may write a custom webhook to require users who want to create a queue must specify capability value. So I think set the root and default queue with infinite is aligned with the behavior when we don't open the capacity plugin, but whether we allow the child queue still inherit from the ancestor, we may need to further discuss whether it's reasonable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks all for the thoughtful discussion.
I recently raised a feature request to support maxApplications on queues. I think there’s an opportunity here to address the concerns around child queues inheriting infinite capacity by considering resource quota and maxApplications together.
By supporting both resource quota (for overall resource usage limits) and maxApplications (for controlling the number of jobs), we can give users more flexibility and additional control in two dimensions. This could help mitigate the risk of child queues having unlimited capacity, while still keeping the root queue default as “infinite” for ease of use. Users could opt-in to more restrictive behaviors as needed.
Perhaps we can look at combining this work or aligning the design to solve both requirements at once. Would love to hear thoughts on whether this approach could address the current concerns. If there’s consensus, I’m happy to help move this forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vzhou-p So is maxApplication part of queue's capability or it's a seperate field? Will it be a required value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maxApplications should be a separate field from the queue's capability (resource quota), rather than being part of the capability structure itself. This keeps the two concepts distinct.
- Capability (resource quota): Controls how much resources (CPU, memory, etc.) can be consumed
- maxApplications: Controls how many jobs/applications can run or be queued
I believe it should be optional, with a sensible default (likely infinite or unlimited). This aligns with the earlier discussion about keeping the root queue infinite by default. Users opt into restrictions as needed rather than being forced to configure everything upfront.
|
I tried testing this in my k8s cluster today. I created a new image only for the scheduler and used that in my cluster. I did not change controller and admission image. I tried submitting a spark job but got the following error from the spark operator logs: Seems that the queue i created is not reporting status: Whereas root: Will do some more digging |
|
If i submit the job to the root queue directly: The driver is pending as well. Scheduler conf: |
Could you provide more logs of it? Like about whether there are some logs showing that the driver pod was be rejected to be scheduled or the root queue exceeded the limit or something |
It's weird, maybe we could check controller's log, I think there may be something wrong with controller to sync the state of queue from volcano/pkg/controllers/queue/state/open.go Lines 43 to 48 in 83f550e
|
Signed-off-by: Endi Caushi <[email protected]>
Introducing new optional argument in the capacity plugin overwriteRootQueueRealCapability. Currently, the root queue's resource limits are always set to the actual cluster resources, regardless of any user-configured capability values. This prevents administrators from setting virtual resource limits that differ from the physical cluster capacity. The new option helps in cases where a user has cluster-autoscaler enabled where the "virtual" resources can be higher than the "real" resources. Signed-off-by: Endi Caushi <[email protected]>
Based on the practices in other plugins, the overwriteRootQueueRealCapability new argument in the capacity plugin comes from a const key. This makes it easier to follow the configurable arguments of the plugin. Signed-off-by: Endi Caushi <[email protected]>
Minor changes into the comments of the plugin. Added a positive/negative unit test to test the argument. Adding some informations about the argument to the capacity plugin's hierarchy documentation. Signed-off-by: Hajnal Mate <[email protected]>
f1f617d to
0800a75
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Hello @mrendi29 ,
Sorry for some reason I thought this is already merged in.
I rebased your branch, modified the logging a little bit (thank you for the access). Also I removed the AI tests with a simple negative-positive test case that follows the structure of the existing test suite. Also added the argument with some context to the appropriate design doc.
I am not sure about the failure you got, but it shouldn't be due to these changes, and this argument can be useful in other use cases.
Please take a look @JesseStutler
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: hajnalmt The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@hajnalmt TSYM for lending a hand in this, I was OOO for the last 2 weeks and have been super busy at work wrapping up year end objectives which is why I have not been able to dedicate more time at this the last month. Your change LGTM as well. i'll test it in production once i get a chance |
|
/ok-to-test @mrendi29 Anytime! 👍 Sorry that this got out of my focus. |
What type of PR is this?
Feature
What this PR does / why we need it:
Problem:
Currently, the root queue's resource limits are always set to the actual cluster resources, regardless of any user-configured capability values. This prevents administrators from setting virtual resource limits that differ from the physical cluster capacity. This helps in cases where a user has cluster-autoscaled enabled and the "virtual" resources can be higher than the "real" resources.
Which issue(s) this PR fixes:
Fixes #3910
Special notes for your reviewer: I have not yet tested this in our production clusters, I plan to test it in the following week.
Does this PR introduce a user-facing change?
None