Skip to content

Commit f93e618

Browse files
fix: add kube-prometheus-stack as gpu-operator dependency (#170)
Signed-off-by: Yuan Chen <yuanchen97@gmail.com> Co-authored-by: Mark Chmarny <mchmarny@users.noreply.github.com>
1 parent 490aa0f commit f93e618

File tree

5 files changed

+7
-4
lines changed

5 files changed

+7
-4
lines changed

examples/recipes/eks-gb200-ubuntu-training-with-validation.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,7 @@ componentRefs:
152152

153153
deploymentOrder:
154154
- cert-manager
155+
- kube-prometheus-stack
155156
- gpu-operator
156157
- nvidia-dra-driver-gpu
157158
- nvsentinel

examples/recipes/eks-training.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ componentRefs:
6363
valuesFile: components/skyhook-operator/values.yaml
6464
deploymentOrder:
6565
- cert-manager
66+
- kube-prometheus-stack
6667
- gpu-operator
6768
- nvsentinel
6869
- skyhook-operator

recipes/overlays/base.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ spec:
5050
namespace: gpu-operator
5151
dependencyRefs:
5252
- cert-manager
53+
- kube-prometheus-stack
5354

5455
- name: nvsentinel
5556
type: Helm

tests/chainsaw/ai-conformance/offline/assert-recipe.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,13 +57,13 @@ deploymentOrder:
5757
- aws-efa
5858
- cert-manager
5959
- dynamo-crds
60-
- gpu-operator
61-
- kai-scheduler
6260
- kgateway-crds
6361
- kgateway
6462
- kube-prometheus-stack
6563
- dynamo-platform
64+
- gpu-operator
6665
- k8s-ephemeral-storage-metrics
66+
- kai-scheduler
6767
- nvidia-dra-driver-gpu
6868
- nvsentinel
6969
- prometheus-adapter

tests/chainsaw/cli/cuj1-training/assert-recipe.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,10 +52,10 @@ deploymentOrder:
5252
- aws-ebs-csi-driver
5353
- aws-efa
5454
- cert-manager
55-
- gpu-operator
56-
- kai-scheduler
5755
- kube-prometheus-stack
56+
- gpu-operator
5857
- k8s-ephemeral-storage-metrics
58+
- kai-scheduler
5959
- kubeflow-trainer
6060
- nvidia-dra-driver-gpu
6161
- nvsentinel

0 commit comments

Comments
 (0)