Skip to content

Commit 2d4f9a8

Browse files
authored
more accurate description of scheduling customizations (#140)
1 parent f240b2e commit 2d4f9a8

12 files changed

+77
-57
lines changed

setup.RHOAI-v2.13/CLUSTER-SETUP.md

+12-9
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Cluster Setup
22

3-
The cluster setup installs Red Hat OpenShift AI and Coscheduler, configures Kueue,
3+
The cluster setup installs Red Hat OpenShift AI and configures Scheduler Plugins, Kueue,
44
cluster roles, and priority classes.
55

66
## Priorities
@@ -10,23 +10,26 @@ Create `default-priority`, `high-priority`, and `low-priority` priority classes:
1010
oc apply -f setup.RHOAI-v2.13/mlbatch-priorities.yaml
1111
```
1212

13-
## Scheduler Plugins
13+
## Scheduler Configuration
1414

15-
MLBatch utilizes Kubernetes Scheduler Plugins to ensure gang scheduling of
16-
multi-Pod workloads and to pack `Pods` onto `Nodes` to reduce GPU fragmentation.
15+
MLBatch configures Kubernetes scheduling to accomplish two objectives:
16+
+ Obtaining gang (all or nothing) scheduling for multi-Pod workloads.
17+
+ Packing Pods whose GPU request is less than the number of GPUs on a Node to
18+
maximize the number of Nodes available for Pods that request all the GPUs on a Node.
19+
20+
This is done by installing the Coscheduling out-of-tree scheduler plugin and configuring
21+
the default NodeResourcesFit scheduler plugin to pack in the GPU dimension.
1722

18-
### Coscheduler
1923

20-
Install Coscheduler v0.28.9 as a secondary scheduler and configure packing:
2124
```sh
2225
helm install scheduler-plugins --namespace scheduler-plugins --create-namespace \
2326
scheduler-plugins/manifests/install/charts/as-a-second-scheduler/ \
2427
--set-json pluginConfig='[{"args":{"scoringStrategy":{"resources":[{"name":"nvidia.com/gpu","weight":1}],"requestedToCapacityRatio":{"shape":[{"utilization":0,"score":0},{"utilization":100,"score":10}]},"type":"RequestedToCapacityRatio"}},"name":"NodeResourcesFit"},{"args":{"permitWaitingTimeSeconds":300},"name":"Coscheduling"}]'
2528
```
26-
Patch Coscheduler pod priorities:
29+
Patch scheduler-plugins pod priorities:
2730
```sh
28-
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.13/coscheduler-priority-patch.yaml scheduler-plugins-controller
29-
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.13/coscheduler-priority-patch.yaml scheduler-plugins-scheduler
31+
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.13/scheduler-priority-patch.yaml scheduler-plugins-controller
32+
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.13/scheduler-priority-patch.yaml scheduler-plugins-scheduler
3033
```
3134

3235

setup.RHOAI-v2.16/CLUSTER-SETUP.md

+12-9
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Cluster Setup
22

3-
The cluster setup installs Red Hat OpenShift AI and Coscheduler, configures Kueue,
3+
The cluster setup installs Red Hat OpenShift AI and configures Scheduler Plugins, Kueue,
44
cluster roles, and priority classes.
55

66
## Priorities
@@ -10,23 +10,26 @@ Create `default-priority`, `high-priority`, and `low-priority` priority classes:
1010
oc apply -f setup.RHOAI-v2.16/mlbatch-priorities.yaml
1111
```
1212

13-
## Scheduler Plugins
13+
## Scheduler Configuration
1414

15-
MLBatch utilizes Kubernetes Scheduler Plugins to ensure gang scheduling of
16-
multi-Pod workloads and to pack `Pods` onto `Nodes` to reduce GPU fragmentation.
15+
MLBatch configures Kubernetes scheduling to accomplish two objectives:
16+
+ Obtaining gang (all or nothing) scheduling for multi-Pod workloads.
17+
+ Packing Pods whose GPU request is less than the number of GPUs on a Node to
18+
maximize the number of Nodes available for Pods that request all the GPUs on a Node.
19+
20+
This is done by installing the Coscheduling out-of-tree scheduler plugin and configuring
21+
the default NodeResourcesFit scheduler plugin to pack in the GPU dimension.
1722

18-
### Coscheduler
1923

20-
Install Coscheduler v0.28.9 as a secondary scheduler and configure packing:
2124
```sh
2225
helm install scheduler-plugins --namespace scheduler-plugins --create-namespace \
2326
scheduler-plugins/manifests/install/charts/as-a-second-scheduler/ \
2427
--set-json pluginConfig='[{"args":{"scoringStrategy":{"resources":[{"name":"nvidia.com/gpu","weight":1}],"requestedToCapacityRatio":{"shape":[{"utilization":0,"score":0},{"utilization":100,"score":10}]},"type":"RequestedToCapacityRatio"}},"name":"NodeResourcesFit"},{"args":{"permitWaitingTimeSeconds":300},"name":"Coscheduling"}]'
2528
```
26-
Patch Coscheduler pod priorities:
29+
Patch scheduler-plugins pod priorities:
2730
```sh
28-
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.16/coscheduler-priority-patch.yaml scheduler-plugins-controller
29-
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.16/coscheduler-priority-patch.yaml scheduler-plugins-scheduler
31+
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.16/scheduler-priority-patch.yaml scheduler-plugins-controller
32+
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.16/scheduler-priority-patch.yaml scheduler-plugins-scheduler
3033
```
3134

3235

setup.RHOAI-v2.17/CLUSTER-SETUP.md

+12-9
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Cluster Setup
22

3-
The cluster setup installs Red Hat OpenShift AI and Coscheduler, configures Kueue,
3+
The cluster setup installs Red Hat OpenShift AI and configures Scheduler Plugins, Kueue,
44
cluster roles, and priority classes.
55

66
## Priorities
@@ -10,23 +10,26 @@ Create `default-priority`, `high-priority`, and `low-priority` priority classes:
1010
oc apply -f setup.RHOAI-v2.17/mlbatch-priorities.yaml
1111
```
1212

13-
## Scheduler Plugins
13+
## Scheduler Configuration
1414

15-
MLBatch utilizes Kubernetes Scheduler Plugins to ensure gang scheduling of
16-
multi-Pod workloads and to pack `Pods` onto `Nodes` to reduce GPU fragmentation.
15+
MLBatch configures Kubernetes scheduling to accomplish two objectives:
16+
+ Obtaining gang (all or nothing) scheduling for multi-Pod workloads.
17+
+ Packing Pods whose GPU request is less than the number of GPUs on a Node to
18+
maximize the number of Nodes available for Pods that request all the GPUs on a Node.
19+
20+
This is done by installing the Coscheduling out-of-tree scheduler plugin and configuring
21+
the default NodeResourcesFit scheduler plugin to pack in the GPU dimension.
1722

18-
### Coscheduler
1923

20-
Install Coscheduler v0.28.9 as a secondary scheduler and configure packing:
2124
```sh
2225
helm install scheduler-plugins --namespace scheduler-plugins --create-namespace \
2326
scheduler-plugins/manifests/install/charts/as-a-second-scheduler/ \
2427
--set-json pluginConfig='[{"args":{"scoringStrategy":{"resources":[{"name":"nvidia.com/gpu","weight":1}],"requestedToCapacityRatio":{"shape":[{"utilization":0,"score":0},{"utilization":100,"score":10}]},"type":"RequestedToCapacityRatio"}},"name":"NodeResourcesFit"},{"args":{"permitWaitingTimeSeconds":300},"name":"Coscheduling"}]'
2528
```
26-
Patch Coscheduler pod priorities:
29+
Patch scheduler-plugins pod priorities:
2730
```sh
28-
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.17/coscheduler-priority-patch.yaml scheduler-plugins-controller
29-
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.17/coscheduler-priority-patch.yaml scheduler-plugins-scheduler
31+
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.17/scheduler-priority-patch.yaml scheduler-plugins-controller
32+
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.17/scheduler-priority-patch.yaml scheduler-plugins-scheduler
3033
```
3134

3235

setup.k8s/CLUSTER-SETUP.md

+18-14
Original file line numberDiff line numberDiff line change
@@ -16,24 +16,28 @@ Create `default-priority`, `high-priority`, and `low-priority` priority classes:
1616
kubectl apply -f setup.k8s/mlbatch-priorities.yaml
1717
```
1818

19-
## Scheduler Plugins
19+
## Scheduler Configuration
20+
21+
MLBatch configures Kubernetes scheduling to accomplish two objectives:
22+
+ Obtaining gang (all or nothing) scheduling for multi-Pod workloads.
23+
+ Packing Pods whose GPU request is less than the number of GPUs on a Node to
24+
maximize the number of Nodes available for Pods that request all the GPUs on a Node.
25+
26+
The currently recommend way to do this is by installing the Coscheduling out-of-tree scheduler
27+
plugin and configuring the default NodeResourcesFit scheduler plugin to pack in the GPU dimension.
28+
Alternatively, you can skip the helm install and patch commands shown below and instead install
29+
the experimental Sakkara scheduler plugin (described next).
2030

21-
MLBatch utilizes Kubernetes Scheduler Plugins to ensure gang scheduling of
22-
multi-Pod workloads and to pack `Pods` onto `Nodes` to reduce GPU fragmentation.
23-
Two options are described below: Coscheduler and Sakkara. You should pick and install one of them
24-
as a secondary scheduler for your cluster.
25-
### Coscheduler
2631

27-
Install Coscheduler v0.28.9 as a secondary scheduler and configure packing:
2832
```sh
2933
helm install scheduler-plugins --namespace scheduler-plugins --create-namespace \
3034
scheduler-plugins/manifests/install/charts/as-a-second-scheduler/ \
3135
--set-json pluginConfig='[{"args":{"scoringStrategy":{"resources":[{"name":"nvidia.com/gpu","weight":1}],"requestedToCapacityRatio":{"shape":[{"utilization":0,"score":0},{"utilization":100,"score":10}]},"type":"RequestedToCapacityRatio"}},"name":"NodeResourcesFit"},{"args":{"permitWaitingTimeSeconds":300},"name":"Coscheduling"}]'
3236
```
33-
Patch Coscheduler pod priorities:
37+
Patch scheduler-plugins pod priorities:
3438
```sh
35-
kubectl patch deployment -n scheduler-plugins --type=json --patch-file setup.k8s/coscheduler-priority-patch.yaml scheduler-plugins-controller
36-
kubectl patch deployment -n scheduler-plugins --type=json --patch-file setup.k8s/coscheduler-priority-patch.yaml scheduler-plugins-scheduler
39+
kubectl patch deployment -n scheduler-plugins --type=json --patch-file setup.k8s/scheduler-priority-patch.yaml scheduler-plugins-controller
40+
kubectl patch deployment -n scheduler-plugins --type=json --patch-file setup.k8s/scheduler-priority-patch.yaml scheduler-plugins-scheduler
3741
```
3842

3943
### Sakkara
@@ -56,9 +60,9 @@ kubectl create namespace mlbatch-system
5660

5761
Install the Kubeflow Training Operator
5862

59-
If you are using Coscheduler do:
63+
If you are using Coscheduling do:
6064
```sh
61-
kubectl apply --server-side -k setup.k8s/training-operator/coscheduler
65+
kubectl apply --server-side -k setup.k8s/training-operator/coscheduling
6266
```
6367
If you are using Sakkara do:
6468
```sh
@@ -76,9 +80,9 @@ kubectl apply --server-side -k setup.k8s/kueue
7680
```
7781

7882
Install the AppWrapper Operator
79-
If you are using Coscheduler do:
83+
If you are using Coscheduling do:
8084
```sh
81-
kubectl apply --server-side -k setup.k8s/appwrapper/coscheduler
85+
kubectl apply --server-side -k setup.k8s/appwrapper/coscheduling
8286
```
8387
If you are using Sakkara do:
8488
```sh

setup.tmpl/CLUSTER-SETUP.md.tmpl

+23-16
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Cluster Setup
22

33
{{ if .RHOAI -}}
4-
The cluster setup installs Red Hat OpenShift AI and Coscheduler, configures Kueue,
4+
The cluster setup installs Red Hat OpenShift AI and configures Scheduler Plugins, Kueue,
55
cluster roles, and priority classes.
66

77
{{- else -}}
@@ -23,26 +23,33 @@ Create `default-priority`, `high-priority`, and `low-priority` priority classes:
2323
{{ .KUBECTL }} apply -f setup.{{ .VERSION }}/mlbatch-priorities.yaml
2424
```
2525

26-
## Scheduler Plugins
26+
## Scheduler Configuration
2727

28-
MLBatch utilizes Kubernetes Scheduler Plugins to ensure gang scheduling of
29-
multi-Pod workloads and to pack `Pods` onto `Nodes` to reduce GPU fragmentation.
30-
{{ if not .RHOAI -}}
31-
Two options are described below: Coscheduler and Sakkara. You should pick and install one of them
32-
as a secondary scheduler for your cluster.
28+
MLBatch configures Kubernetes scheduling to accomplish two objectives:
29+
+ Obtaining gang (all or nothing) scheduling for multi-Pod workloads.
30+
+ Packing Pods whose GPU request is less than the number of GPUs on a Node to
31+
maximize the number of Nodes available for Pods that request all the GPUs on a Node.
32+
33+
{{ if .RHOAI -}}
34+
This is done by installing the Coscheduling out-of-tree scheduler plugin and configuring
35+
the default NodeResourcesFit scheduler plugin to pack in the GPU dimension.
36+
{{- else -}}
37+
The currently recommend way to do this is by installing the Coscheduling out-of-tree scheduler
38+
plugin and configuring the default NodeResourcesFit scheduler plugin to pack in the GPU dimension.
39+
Alternatively, you can skip the helm install and patch commands shown below and instead install
40+
the experimental Sakkara scheduler plugin (described next).
3341
{{- end }}
34-
### Coscheduler
3542

36-
Install Coscheduler v0.28.9 as a secondary scheduler and configure packing:
43+
3744
```sh
3845
helm install scheduler-plugins --namespace scheduler-plugins --create-namespace \
3946
scheduler-plugins/manifests/install/charts/as-a-second-scheduler/ \
4047
--set-json pluginConfig='[{"args":{"scoringStrategy":{"resources":[{"name":"nvidia.com/gpu","weight":1}],"requestedToCapacityRatio":{"shape":[{"utilization":0,"score":0},{"utilization":100,"score":10}]},"type":"RequestedToCapacityRatio"}},"name":"NodeResourcesFit"},{"args":{"permitWaitingTimeSeconds":300},"name":"Coscheduling"}]'
4148
```
42-
Patch Coscheduler pod priorities:
49+
Patch scheduler-plugins pod priorities:
4350
```sh
44-
{{ .KUBECTL }} patch deployment -n scheduler-plugins --type=json --patch-file setup.{{ .VERSION }}/coscheduler-priority-patch.yaml scheduler-plugins-controller
45-
{{ .KUBECTL }} patch deployment -n scheduler-plugins --type=json --patch-file setup.{{ .VERSION }}/coscheduler-priority-patch.yaml scheduler-plugins-scheduler
51+
{{ .KUBECTL }} patch deployment -n scheduler-plugins --type=json --patch-file setup.{{ .VERSION }}/scheduler-priority-patch.yaml scheduler-plugins-controller
52+
{{ .KUBECTL }} patch deployment -n scheduler-plugins --type=json --patch-file setup.{{ .VERSION }}/scheduler-priority-patch.yaml scheduler-plugins-scheduler
4653
```
4754

4855
{{ if not .RHOAI -}}
@@ -137,9 +144,9 @@ Create the mlbatch-system namespace
137144

138145
Install the Kubeflow Training Operator
139146

140-
If you are using Coscheduler do:
147+
If you are using Coscheduling do:
141148
```sh
142-
{{ .KUBECTL }} apply --server-side -k setup.{{ .VERSION }}/training-operator/coscheduler
149+
{{ .KUBECTL }} apply --server-side -k setup.{{ .VERSION }}/training-operator/coscheduling
143150
```
144151
If you are using Sakkara do:
145152
```sh
@@ -157,9 +164,9 @@ Install Kueue
157164
```
158165

159166
Install the AppWrapper Operator
160-
If you are using Coscheduler do:
167+
If you are using Coscheduling do:
161168
```sh
162-
{{ .KUBECTL }} apply --server-side -k setup.{{ .VERSION }}/appwrapper/coscheduler
169+
{{ .KUBECTL }} apply --server-side -k setup.{{ .VERSION }}/appwrapper/coscheduling
163170
```
164171
If you are using Sakkara do:
165172
```sh

0 commit comments

Comments
 (0)