Skip to content

Commit ed4973b

Browse files
authored
refactor: move kai-scheduler and DRA driver to base overlay for CNCF AI conformance (#139)
Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
1 parent 1701080 commit ed4973b

File tree

5 files changed

+22
-29
lines changed

5 files changed

+22
-29
lines changed

recipes/overlays/base.yaml

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,4 +69,20 @@ spec:
6969
version: 1.19.2
7070
valuesFile: components/k8s-ephemeral-storage-metrics/values.yaml
7171
dependencyRefs:
72-
- kube-prometheus-stack
72+
- kube-prometheus-stack
73+
74+
- name: nvidia-dra-driver-gpu
75+
type: Helm
76+
source: https://helm.ngc.nvidia.com/nvidia
77+
version: "25.12.0"
78+
valuesFile: components/nvidia-dra-driver-gpu/values.yaml
79+
dependencyRefs:
80+
- gpu-operator
81+
82+
- name: kai-scheduler
83+
type: Helm
84+
source: oci://ghcr.io/nvidia/kai-scheduler
85+
version: v0.12.14
86+
valuesFile: components/kai-scheduler/values.yaml
87+
dependencyRefs:
88+
- gpu-operator

recipes/overlays/h100-eks-ubuntu-inference-dynamo.yaml

Lines changed: 0 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -75,11 +75,6 @@ spec:
7575

7676
- name: nvidia-dra-driver-gpu
7777
type: Helm
78-
source: https://helm.ngc.nvidia.com/nvidia
79-
version: "25.12.0"
80-
valuesFile: components/nvidia-dra-driver-gpu/values.yaml
81-
dependencyRefs:
82-
- gpu-operator
8378
overrides:
8479
gpuResourcesEnabledOverride: true
8580
# EKS has no control-plane nodes — remove the default nodeAffinity
@@ -89,14 +84,6 @@ spec:
8984
tolerations:
9085
- operator: Exists
9186

92-
- name: kai-scheduler
93-
type: Helm
94-
source: oci://ghcr.io/nvidia/kai-scheduler
95-
version: v0.12.14
96-
valuesFile: components/kai-scheduler/values.yaml
97-
dependencyRefs:
98-
- gpu-operator
99-
10087
- name: dynamo-crds
10188
type: Helm
10289
source: https://helm.ngc.nvidia.com/nvidia/ai-dynamo

recipes/overlays/h100-kind-inference-dynamo.yaml

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -34,14 +34,6 @@ spec:
3434
value: ">= 1.34"
3535

3636
componentRefs:
37-
- name: kai-scheduler
38-
type: Helm
39-
source: oci://ghcr.io/nvidia/kai-scheduler
40-
version: v0.12.14
41-
valuesFile: components/kai-scheduler/values.yaml
42-
dependencyRefs:
43-
- gpu-operator
44-
4537
- name: dynamo-crds
4638
type: Helm
4739
source: https://helm.ngc.nvidia.com/nvidia/ai-dynamo

recipes/overlays/kind.yaml

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -179,15 +179,9 @@ spec:
179179
cpu: 500m
180180
memory: 1Gi
181181

182-
# nvidia-dra-driver-gpu: DRA driver for GPU resource management
183-
# For kind with GPU passthrough, use "/" as driver root (host drivers)
182+
# nvidia-dra-driver-gpu: override driver root for kind GPU passthrough
184183
- name: nvidia-dra-driver-gpu
185184
type: Helm
186-
source: https://helm.ngc.nvidia.com/nvidia
187-
version: "25.12.0"
188-
valuesFile: components/nvidia-dra-driver-gpu/values.yaml
189-
dependencyRefs:
190-
- gpu-operator
191185
overrides:
192186
# Use "/" for host-installed drivers (kind GPU passthrough)
193187
nvidiaDriverRoot: "/"

tests/chainsaw/cli/cuj1-training/assert-recipe.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,10 @@ componentRefs: ## alphabetically sorted
4040
- name: cert-manager
4141
- name: gpu-operator
4242
- name: k8s-ephemeral-storage-metrics
43+
- name: kai-scheduler
4344
- name: kube-prometheus-stack
4445
- name: kubeflow-trainer
46+
- name: nvidia-dra-driver-gpu
4547
- name: nvsentinel
4648
- name: prometheus-adapter
4749
- name: skyhook-customizations
@@ -51,9 +53,11 @@ deploymentOrder:
5153
- aws-efa
5254
- cert-manager
5355
- gpu-operator
56+
- kai-scheduler
5457
- kube-prometheus-stack
5558
- k8s-ephemeral-storage-metrics
5659
- kubeflow-trainer
60+
- nvidia-dra-driver-gpu
5761
- nvsentinel
5862
- prometheus-adapter
5963
- skyhook-operator

0 commit comments

Comments
 (0)