You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(scheduler): add semi-preemptible API usage examples
Document that semi-preemptible reuses the existing preemptibility API
(a third enum/label value, no new field): show it set on the PodGroup
spec, via the kai.scheduler/preemptibility workload label, and on a
multi-level tree using minSubGroup for subgroup-level core/elastic.
Signed-off-by: SiorMeir <msior@nvidia.com>
Copy file name to clipboardExpand all lines: docs/developer/designs/semi-preemptible/README.md
+56Lines changed: 56 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,6 +10,62 @@ We want to add a new 3rd mode, named **semi-preemptible**, where the podgroup is
10
10
11
11
A workload with `minReplicas` such as Inference and Elastic Distributed Training can request to be non-preemptible up to its `minReplicas`, with any pods above that count being preemptible. This allows running a critical workload with some assured resources and some on-demand, availability-based resources.
12
12
13
+
## Usage
14
+
15
+
Semi-preemptible reuses the **existing preemptibility API** introduced in [priority/preemptibility separation](../priority-preemptibility-separation/README.md) — it is simply a third value, `semi-preemptible`, alongside `preemptible` and `non-preemptible`. There is no new field or label. The minimum that stays non-preemptible is the PodGroup's existing minimum shape (`minMember` at leaf PodSets, `minSubGroup` at intermediate nodes); nothing extra needs to be declared.
16
+
17
+
It can be set either directly on the PodGroup spec, or via the `kai.scheduler/preemptibility` label on a workload (the PodGrouper passes it through to the generated PodGroup).
18
+
19
+
**On the PodGroup spec** — a single elastic group (3 core pods, bursts beyond):
**On a workload (label)** — the PodGrouper propagates it to the PodGroup:
33
+
```yaml
34
+
apiVersion: apps/v1
35
+
kind: Deployment
36
+
metadata:
37
+
name: elastic-inference
38
+
spec:
39
+
template:
40
+
metadata:
41
+
labels:
42
+
kai.scheduler/preemptibility: "semi-preemptible"
43
+
spec:
44
+
# ... pod spec
45
+
```
46
+
47
+
**Subgroup-level (multi-level tree)** — `minSubGroup` makes whole subgroups core vs. elastic:
48
+
```yaml
49
+
apiVersion: scheduling.kai.nvidia.com/v2alpha2
50
+
kind: PodGroup
51
+
metadata:
52
+
name: segmented-training
53
+
spec:
54
+
preemptibility: "semi-preemptible"
55
+
minSubGroup: 2# 2 core subgroups; additional subgroups are elastic
56
+
subGroups:
57
+
- name: segment-0
58
+
minMember: 4
59
+
- name: segment-1
60
+
minMember: 4
61
+
- name: segment-2 # elastic — reclaimed as a whole before any core subgroup
62
+
minMember: 4
63
+
priorityClassName: "train"
64
+
# ... rest of podgroup spec
65
+
```
66
+
67
+
When `preemptibility` is omitted, behavior is unchanged (priority-based default). Semi-preemptible is never applied implicitly — it is always opt-in.
68
+
13
69
## Quota Requirements
14
70
15
71
The "core" pods (up to `minMember` per leaf PodSet) must be in-quota when allocated. Any "extra" pods can be allocated over-quota. All pods must respect the Limit setting for the job's queue.
0 commit comments