-
Notifications
You must be signed in to change notification settings - Fork 500
Fix EnsureWorkloadSlices to finish old slice when new is admitted as replacement #8456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix EnsureWorkloadSlices to finish old slice when new is admitted as replacement #8456
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sohankunkerkar The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a bug in the workload slice cleanup logic during elastic job scale-up operations. When the scheduler admits a new workload slice and sets the WorkloadSliceReplacementFor annotation, the old slice should be finished. Previously, the code only finished the old slice if it lost quota reservation or was evicted, missing the replacement annotation scenario.
Key Changes
- Added a third condition to finish old workload slices when the new slice has the replacement annotation pointing to it
- Added comprehensive inline documentation explaining the three conditions for finishing old slices
- Added test coverage for the replacement annotation scenario
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| pkg/workloadslicing/workloadslicing.go | Enhanced EnsureWorkloadSlices case 2 logic to check for replacement annotation and finish old slices accordingly |
| pkg/workloadslicing/workloadslicing_test.go | Added new test case "TwoWorkloads_OldWithReservedQuota_NewWithReplacementAnnotation" to verify the fix |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…replacement When the scheduler admits a new workload slice but fails to finish the old slice, the job controller should clean it up. This fixes a race condition where old workload slices weren't being cleaned up during scale-up events, causing the RayJob InTreeAutoscaling e2e test to see 4 workloads instead of 3. Signed-off-by: Sohan Kunkerkar <[email protected]>
bcc7088 to
3d31046
Compare
When the scheduler admits a new workload slice but fails to finish the old slice, the job controller should clean it up. This fixes a race condition where old workload slices weren't being cleaned up during scale-up events,
causing the RayJob InTreeAutoscaling e2e test to see 4 workloads instead of 3.
What type of PR is this?
/kind bug
/kind flake
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #8447
Special notes for your reviewer:
Does this PR introduce a user-facing change?