Skip to content

helm starts two daemon sets with the same matchLabel #1480

@peterzandbergen

Description

@peterzandbergen

When I run helm with this values file, I get two daemon sets that try to control the same pods, because the selector.matchLabel is identical.

I suspect that this is a bug.

I use the latest version of the helm chart.

Values

# -----------------------------------------------------------------
# --- THIS IS THE CORRECT CONFIGURATION FOR MPS ---
#
# This block creates the configuration file.
config:
  map:
    # You can name this entry anything, e.g., "mps-config"
    mps-config: |-
      version: v1
      sharing:
        mps:
          resources:
          - name: nvidia.com/gpu
            replicas: 2 # Starting with 2 as a safe value
  
  # This tells the plugin to use the config block you just defined.
  default: "mps-config"

  # This block gives the MPS daemon the host permissions it needs.
  mps:
    enableHostPID: true
    # -----------------------------------------------------------------    
    
gdf:
  enabled: true

affinity: null

runtimeClassName: nvidia

nodeSelector:
  nvidia.com/gpu: "true"

tolerations:
  # Chart's default
  - key: CriticalAddonsOnly
    operator: Exists
  # Chart's default
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule
  # The missing toleration for your control-plane GPU node
  - key: node-role.kubernetes.io/control-plane
    operator: Exists
    effect: NoSchedule

The pods that these values start:

kubectl get pods -n nvidia --show-labels -o wide
NAME                                            READY   STATUS    RESTARTS   AGE   IP           NODE                NOMINATED NODE   READINESS GATES   LABELS
nvidia-device-plugin-mps-control-daemon-fjcms   2/2     Running   0          29m   10.244.5.6   k8s-cluster-w-01    <none>           <none>            app.kubernetes.io/instance=nvidia-device-plugin,app.kubernetes.io/name=nvidia-device-plugin,controller-revision-hash=fb6d478fd,pod-template-generation=3
nvidia-device-plugin-mps-control-daemon-kzq9b   2/2     Running   0          29m   10.244.4.5   k8s-cluster-cp-03   <none>           <none>            app.kubernetes.io/instance=nvidia-device-plugin,app.kubernetes.io/name=nvidia-device-plugin,controller-revision-hash=fb6d478fd,pod-template-generation=3
nvidia-device-plugin-ns4lj                      2/2     Running   0          29m   10.244.5.5   k8s-cluster-w-01    <none>           <none>            app.kubernetes.io/instance=nvidia-device-plugin,app.kubernetes.io/name=nvidia-device-plugin,controller-revision-hash=574f484c6d,pod-template-generation=3
nvidia-device-plugin-t5q5f                      2/2     Running   0          29m   10.244.4.4   k8s-cluster-cp-03   <none>           <none>            app.kubernetes.io/instance=nvidia-device-plugin,app.kubernetes.io/name=nvidia-device-plugin,controller-revision-hash=574f484c6d,pod-template-generation=3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions