Pod is scheduled to a GPU card that does not meet expectations of gpu scheduler policy in multi numa GPU node.

**What happened**:  
Pod is scheduled to a GPU card that does not meet expectations of gpu scheduler policy in multi numa GPU node. Pod are configured as "spread", but Pod is scheduled to GPU card with high usage.

**What you expected to happen**: 
when gpu scheduler policy configured as "spread", pod should scheduled to GPU card with low usage.

**How to reproduce it (as minimally and precisely as possible)**: 
This problem only occur in multi numa node 

**Anything else we need to know?**:

- The output of `nvidia-smi -a` on your host
- Your docker or containerd configuration file (e.g: `/etc/docker/daemon.json`)
- The hami-device-plugin container logs
- The hami-scheduler container logs
- The kubelet logs on the node (e.g: `sudo journalctl -r -u kubelet`)
- Any relevant kernel output lines from `dmesg`

**Environment**:
- HAMi version: v2.5.0
- nvidia driver or other AI device driver version:
- Docker version from `docker version`
- Docker command, image and tag used
- Kernel version from `uname -a`
- Others:



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pod is scheduled to a GPU card that does not meet expectations of gpu scheduler policy in multi numa GPU node. #1006

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pod is scheduled to a GPU card that does not meet expectations of gpu scheduler policy in multi numa GPU node. #1006

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions