-
Notifications
You must be signed in to change notification settings - Fork 754
Open
Description
When I run helm with this values file, I get two daemon sets that try to control the same pods, because the selector.matchLabel is identical.
I suspect that this is a bug.
I use the latest version of the helm chart.
Values
# -----------------------------------------------------------------
# --- THIS IS THE CORRECT CONFIGURATION FOR MPS ---
#
# This block creates the configuration file.
config:
map:
# You can name this entry anything, e.g., "mps-config"
mps-config: |-
version: v1
sharing:
mps:
resources:
- name: nvidia.com/gpu
replicas: 2 # Starting with 2 as a safe value
# This tells the plugin to use the config block you just defined.
default: "mps-config"
# This block gives the MPS daemon the host permissions it needs.
mps:
enableHostPID: true
# -----------------------------------------------------------------
gdf:
enabled: true
affinity: null
runtimeClassName: nvidia
nodeSelector:
nvidia.com/gpu: "true"
tolerations:
# Chart's default
- key: CriticalAddonsOnly
operator: Exists
# Chart's default
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
# The missing toleration for your control-plane GPU node
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
The pods that these values start:
kubectl get pods -n nvidia --show-labels -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS
nvidia-device-plugin-mps-control-daemon-fjcms 2/2 Running 0 29m 10.244.5.6 k8s-cluster-w-01 <none> <none> app.kubernetes.io/instance=nvidia-device-plugin,app.kubernetes.io/name=nvidia-device-plugin,controller-revision-hash=fb6d478fd,pod-template-generation=3
nvidia-device-plugin-mps-control-daemon-kzq9b 2/2 Running 0 29m 10.244.4.5 k8s-cluster-cp-03 <none> <none> app.kubernetes.io/instance=nvidia-device-plugin,app.kubernetes.io/name=nvidia-device-plugin,controller-revision-hash=fb6d478fd,pod-template-generation=3
nvidia-device-plugin-ns4lj 2/2 Running 0 29m 10.244.5.5 k8s-cluster-w-01 <none> <none> app.kubernetes.io/instance=nvidia-device-plugin,app.kubernetes.io/name=nvidia-device-plugin,controller-revision-hash=574f484c6d,pod-template-generation=3
nvidia-device-plugin-t5q5f 2/2 Running 0 29m 10.244.4.4 k8s-cluster-cp-03 <none> <none> app.kubernetes.io/instance=nvidia-device-plugin,app.kubernetes.io/name=nvidia-device-plugin,controller-revision-hash=574f484c6d,pod-template-generation=3
Metadata
Metadata
Assignees
Labels
No labels