Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preemption admission check controller. KEP update. #1178

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions keps/993-two-phase-admission/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -256,9 +256,9 @@ The controller implementing a particular check should:

### Preemption Admission Check Controller

In this proposal, the time to evict the preemption candidates varies based on the preemptor state
the scheduler will not issue the eviction during it's process instead it will set a `Pending` admission check
that is manged by a new built-in admission check controller.
In this proposal, the time to evict the preemption candidates varies based on the preemptor state.
The scheduler will not issue the eviction during it's process instead it will set a `Pending` admission check
that is manged by a single instance of a new built-in admission check controller.

The **Preemption Admission Check Controller** will:

Expand All @@ -271,7 +271,7 @@ The preemption controller uses the kueue cache, since it needs to check the stat
At every run the controller will get the list of workloads pending preemption.

The workloads pending preemption are divided into:
- `preemtingLetaer` - Workloads having at least one check that uses AfterCheckPassedOrOnDemand policy with the state `pending`.
- `preemtingLeter` - Workloads having at least one check that uses AfterCheckPassedOrOnDemand policy with the state `pending`.
- `preemtingNow` - Workloads that expect to be able to issue evictions or potentially change their preemption state in the current cycle.

Then:
Expand All @@ -286,7 +286,9 @@ Then:
- Issue the eviction to the candidates.
- Add it to the snapshot
- If the updated list is empty, meaning the preemption cannot be done.
- Set its admission check to `Ready`
- Set its admission check to `Retry`, the quota reservation will be lost and the workload placed in the queue waiting for a new QuotaReservation.

**NOTE** The list of candidates is picked out from the list of workloads holding a QuotaReservation, regardless if they are fully Admitted or not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is QuotaReservation enough? Consider jobs waiting for ProvReq to be fulfilled could be waiting for hours. There is no point in preempting them yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ProvRequest Admission check can be configured with AfterCheckPassedOrOnDemand policy, the evictions will be done only when the provisioning is finished (and only if still needed).

In the future , if we can know that the provisioning request has no chance to succeed for long period of times, the ProvRecAC can set its state to Retry this will fully release the quota.After that time the state can be set to Pending to put wl back in the queue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is more about the candidates for eviction. Obviously, fully Admitted workloads are candidates for eviction.
Other than that, I think it should be only the ones that have checks ready or are requesting preemption themselves.

Copy link
Contributor Author

@trasc trasc Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be a source of priority inversion. Say we have wl1 (p=1, reserving, one check pending), wl2 (p=2, fully admited), wl3(p=3, needs to evict wl1 or wl2), We should not evict wl2 just because wl1 has some pending admission check.


### Test Plan

Expand Down