Open
Description
What happened:
在使用gpuSchedulerPolicy=binpack策略时,显卡错分,pod注解中的gpu-uuid与进入容器使用nvidia-smi查询出来的不同
What you expected to happen:
每个pod分配到正确的显卡上
How to reproduce it (as minimally and precisely as possible):
scheduler.defaultSchedulerPolicy.gpuSchedulerPolicy = binpack
同时调度7-10个能力
H800的显卡,每张只有80G显存,理论上不可能分到超过80G
Environment:
- HAMi version: 2.5.0
- nvidia driver or other AI device driver version:
- Docker version from
docker version
: containerd 1.6.33 - Kernel version from
uname -a
: centos 7.9