Skip to content

POD NodeSelector is not always consistent with their MPIJob node selector #3400

@GonzaloSaez

Description

@GonzaloSaez

What happened:

We are launching MPIJobs using a LocalQueue with kueue (in particular cpu-local-queue from the Yaml fround at the end of the issue). The ClusterQueue associated ResourceFlavor uses the appropriate nodeLabels to target a specific GKE nodepool. We are not setting the MPIJob NodeSelector when launching it. When launching the job, kueue sets the correct NodeSelector on the MPI job. However, the pods NodeSelector is empty. Note that we are not setting the suspend field in the MPIJob, I let kueue do it for us.

What you expected to happen:

The MPIJob pods should have the same NodeSelector as the MPIJob. This is also documented in https://kueue.sigs.k8s.io/docs/concepts/resource_flavor/

Kueue adds the ResourceFlavor labels to the .nodeSelector of the underlying Workload Pod templates. This occurs if the Workload didn’t specify the ResourceFlavor labels already as part of its nodeSelector.

Environment:

GKE 1.30 + kueue 0.8.1 + waitForPodsReady=true. These are the kueue resources

apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: cpu
spec:
  nodeLabels:
    cloud.google.com/gke-nodepool: e2x4

---

apiVersion: kueue.x-k8s.io/v1beta1
kind: ProvisioningRequestConfig
metadata:
  name: cpu-prov-config
spec:
  provisioningClassName: check-capacity.autoscaling.x-k8s.io

---

apiVersion: kueue.x-k8s.io/v1beta1
kind: AdmissionCheck
metadata:
  name: check-capacity-cpu-prov
spec:
  controllerName: kueue.x-k8s.io/provisioning-request
  parameters:
    apiGroup: kueue.x-k8s.io
    kind: ProvisioningRequestConfig
    name: cpu-prov-config

---

apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: "cpu-cluster-queue"
spec:
  namespaceSelector: {}
  preemption:
    withinClusterQueue: LowerPriority
  resourceGroups:
    - coveredResources: ["cpu", "memory"]
      flavors:
        - name: "cpu"
          resources:
            - name: "cpu"
              nominalQuota: "12"
            - name: "memory"
              nominalQuota: 52000Gi
  admissionChecks:
    - check-capacity-cpu-prov

---

apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  name: cpu-local-queue
  namespace: mynamespace
spec:
  clusterQueue: cpu-cluster-queue

This can be replicated with the MPIOperator example. The launcher does not have NodeSelector set but the workers do have it.

apiVersion: kubeflow.org/v2beta1
kind: MPIJob
metadata:
  name: pi
  namespace: mynamespace
  labels:
    kueue.x-k8s.io/queue-name: cpu-local-queue
spec:
  slotsPerWorker: 1
  runPolicy:
    cleanPodPolicy: None
  sshAuthMountPath: /home/mpiuser/.ssh
  mpiReplicaSpecs:
    Launcher:
      replicas: 1
      template:
        spec:
          containers:
            - image: mpioperator/mpi-pi:openmpi
              name: mpi-launcher
              securityContext:
                runAsUser: 1000
              command:
                - mpirun
              args:
                - -n
                - "2"
                - /home/mpiuser/pi
              resources:
                limits:
                  cpu: 1
                  memory: 1Gi
    Worker:
      replicas: 2
      template:
        spec:
          containers:
            - image: mpioperator/mpi-pi:openmpi
              name: mpi-worker
              securityContext:
                runAsUser: 1000
              command:
                - /usr/sbin/sshd
              args:
                - -De
                - -f
                - /home/mpiuser/.sshd_config
              resources:
                requests:
                  cpu: "1300m"
                  memory: 3Gi
                limits:
                  cpu: "1300m"
                  memory: 3Gi

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions