Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preemption admission check controller. KEP update. #1178

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 23 additions & 1 deletion keps/993-two-phase-admission/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Preemption Admission Check Controller](#preemption-admission-check-controller)
- [Test Plan](#test-plan)
- [Prerequisite testing updates](#prerequisite-testing-updates)
- [Unit Tests](#unit-tests)
Expand Down Expand Up @@ -206,7 +207,7 @@ pass AND there are some AdmissionChecks configured, AND the
workload is not in on-hold retry state from some check, it will:

1. Fill the Admission field in workload, with the desired flavor assignment.
2. Not do any preemptions yet (unless BookCapacity is set to true).
2. Not do any preemptions yet (unless the check uses `Anytime` preemption policy).
3. Set "QuotaReserved" to true.

Kueue will only pass as many pods into "QuotaReserved" as there would
Expand Down Expand Up @@ -253,6 +254,27 @@ The controller implementing a particular check should:
* After approving the workload, keep an eye on the check if it starts failing,
fail the check and cause the workload to move back to the suspended state.

### Preemption Admission Check Controller

Because now, the time when the eviction of the preemption candidate should vary based on the preemptor state
the scheduler will not issue the eviction during it's process instead it will set a `Pending` admission check
that is manged by a new built-in admission check controller.

The **Admission Check Controller** will:

- Watch for a change in state of the workloads pending preemption.
- Watch for workloads that are finishing execution and therefore releasing quota.
- Watch for AdmissionCheck changes, since the preemption policy can change.

Since evaluating the current preemption state and the needs of a workload requires checking the sate of all resource pool the events are rate limited the kueue's cache is used.

At every run the controller will get the list of workloads pending preemption.
Since for some of these workloads is not necessary to issue eviction at that given point (eg. Having a pending check that uses AfterCheckPassedOrOnDemand policy) their quota reservation will be ignored.
Fore every other preemption pending workloads, it will check if it can fit without evicting other workloads, case in which the preemption admission check condition will be set to `Ready`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Fore every other preemption pending workloads, it will check if it can fit without evicting other workloads, case in which the preemption admission check condition will be set to `Ready`.
For every other preemption pending workloads, it will check if it can fit without evicting other workloads, case in which the preemption admission check condition will be set to `Ready`.

Preemption can run before other checks have finished their transition to Ready. By the time every check has transitioned to Ready, we still need to check if we need more preemptions.
So, in a sense, there is no concept of "preemption ready", we need to continue checking until the workload is fully admitted. And once it fits, and the checks are all "ready", we can directly transition to admitted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quota reserved by a workload is considered in use by the scheduler, if another workload gets is reservation after the fist one gets to "preemption ready" , that second workload will need to trigger it's own set of evictions and so on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(fore -> for done)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quota reserved by a workload is considered in use by the scheduler, if another workload gets is reservation after the fist one gets to "preemption ready" , that second workload will need to trigger it's own set of evictions and so on.

Yes, that I understand.

Let me state my thinking again. There are scenarios where preemption needs to run before other admission checks have completed, because other AdmissionChecks can request preemptions.
In that case, we will issue preemptions, but even when completed, we might still need preemptions later on, when all the AdmissionChecks actually complete.

That's why I say that there might not be a concept of "preemptions are ready".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preemption is finished when the workload can fit without the need of any additional evictions. Pass this point its quota is locked and can start execution, or wait more. The quote remains locked until this workload finishes or gets evicted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes me think that the existing QuotaReserved condition will have different semantics. And the idea of QuotaReserved could actually be used to signal that preemption is "ready".

But maybe that's a bit of over-engineering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... we could a yet another Quota specific condition like QuotaLocked, to only be set when all the evictions are done, then change the Admission logic from QuoataReserved && AdmissionChecks==Ready to QuoataLocked && AdmissionChecks==Ready.

The preemption controller can then manage workloads that have QuoataReserved but not QuoataLocked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would QuotaReserved be equivalent to "preemption ready" then?

Copy link
Contributor Author

@trasc trasc Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other way around, will have less impact on the code, The scheduler creates the admission struct and sets the QuotaReserved, the preemption controller will set the new condition when no evictions are needed. The workload controller will set the Admitted condition when the new condition is true and all Admission checks are ready. (sure we can add a fast path in the scheduler so if at reservation time the wl fits without the need of evictions set both conditions)

If eviction of other workload is still needed, an updated list candidates is created and eviction is issued for all of them.
If the updated list of candidates is empty, meaning that the preemption can no longer succeed, the preemption admission check is set as `Retry`, the workload will louse it's quota reservation and requeued.


### Test Plan

[ x ] I/we understand the owners of the involved components may require updates to
Expand Down