Skip to content

显卡错分,pod注解中的gpu-uuid与进入容器使用nvidia-smi查询出来的不同 #968

Open
@zhegemingzimeibanquan

Description

@zhegemingzimeibanquan

What happened:
在使用gpuSchedulerPolicy=binpack策略时,显卡错分,pod注解中的gpu-uuid与进入容器使用nvidia-smi查询出来的不同

Image

What you expected to happen:
每个pod分配到正确的显卡上

How to reproduce it (as minimally and precisely as possible):
scheduler.defaultSchedulerPolicy.gpuSchedulerPolicy = binpack
同时调度7-10个能力

Image
H800的显卡,每张只有80G显存,理论上不可能分到超过80G

Environment:

  • HAMi version: 2.5.0
  • nvidia driver or other AI device driver version:
  • Docker version from docker version: containerd 1.6.33
  • Kernel version from uname -a: centos 7.9

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions