Skip to content

pods are placed under same NUMA node-0 even with TAS scoringStrategy:"LeastAllocated" #956

@hikkart

Description

@hikkart

Area

  • Scheduler
  • Controller
  • Helm Chart
  • Documents

Other components

No response

What happened?

Baremetal worker-node-A is part of k8s and having two numa-nodes numa-0 and numa-1
Updated kubelet toplogypolicy as SingleNUMANodePodLevel.

  1. Installed NFD-NRT enabled using helm - https://github.com/kubernetes-sigs/node-feature-discovery/tree/v0.18.3/deployment/helm/node-feature-discovery/

NRT shows properly that each numa is having cpu 36 allocatable.

#kubectl get noderesourcetopology worker-node-A -c yaml
apiVersion: topology.node.k8s.io/vlalpha2
attributes:
  - name: topologyManagerPolicy
    value: single-numa-node
  - name: topologyManagerScope
    value: pod
  - name: nodeTopologyPodsFingerprint
    value: pfp@v0011a4babe00fa3033e
kind: NodeResourceTopology
metadata:
  name: worker-node-A
  ownerReferences:
    - apiVersion: v1
      kind: Namespace
      name: node-feature-discovery
topologyPolicies:
  - SingleNUMANodePodLevel
zones:
  - costs:
      - name: node-0
        value: 10
      - name: node-1
        value: 21
    name: node-0
    resources:
      - allocatable: "36"
        available: "36"
        capacity: "40"
        name: cpu
      - allocatable: "94060482560"
        available: "90914754560"
        capacity: "99966062592"
        name: memory
    type: Node
  - costs:
      - name: node-0
        value: 21
      - name: node-1
        value: 10
    name: node-1
    resources:
      - allocatable: "36"
        available: "36"
        capacity: "40"
        name: cpu
      - allocatable: "95497109504"
        available: "95497109504"
        capacity: "101402689536"
        name: memory
    type: Node
  1. Installed TAS with helm https://github.com/kubernetes-sigs/scheduler-plugins/blob/master/manifests/install/charts/as-a-second-scheduler/ for the branch v0.33.5
    kubectl get pods -n node-feature-discovery shows all pods are running properly
pod/nfd-node-feature-discovery-gc-6cf4b459c5-2qbw7
pod/nfd-node-feature-discovery-master-7995d4bfdf-2q88c
pod/nfd-node-feature-discovery-topology-updater-x86xc
pod/nfd-node-feature-discovery-worker-9229z
pod/topology-aware-controller-6f457f4d55-bptxv
pod/topology-aware-scheduler-54c6c8bdfb-tsvxf

kubectl get config scheduler-config -n nfd-node-feature-discovery -o yaml shows similar to https://github.com/kubernetes-sigs/scheduler-plugins/blob/master/manifests/noderesourcetopology/scheduler-configmap.yaml

  1. Created pod0.yml with 'schedulerName: topo-aware-scheduler' which request 5 CPUs.-> it placed on numa-0
    name: node-0
    resources:
      - allocatable: "36"
        available: "31"
        capacity: "40"
        name: cpu
  1. Created pod1.yml with 'schedulerName: topo-aware-scheduler' which request 5 CPUs.-> pods2 also STILL placed on numa-0.
    name: node-0
    resources:
      - allocatable: "36"
        available: "26"
        capacity: "40"
        name: cpu

Expected to be placed on numa-1 due to

  scoringStrategy:
	type: "LeastAllocated"

Why it it not placed on numa-1. I believe numa-isolation is supported within single worker-node-A.

Thanks for your support.

What did you expect to happen?

pod0 on numa node-0 and pod1 on numa node-1 should be placed and balanced.

How can we reproduce it (as minimally and precisely as possible)?

No response

Anything else we need to know?

No response

Kubernetes version

Details

v1.33.4+rke2r1

Scheduler Plugins version

Details v0.33.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.sig/schedulingCategorizes an issue or PR as relevant to SIG Scheduling.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions