Skip to content

Device plugin fails to start with iluvatar devices #986

Open
@MaxCaresYww

Description

@MaxCaresYww

What happened: Install HAMi v2.5.0 in arm64 environment with iluvatar GPU devices, device plugin container fails to start.

What you expected to happen: HAMi is installed successfully.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

  • The output of nvidia-smi -a on your host
    • This is not NVIDIA GPUs.
  • Your docker or containerd configuration file (e.g: /etc/docker/daemon.json)
  • The hami-device-plugin container logs
  • The hami-scheduler container logs:
  • The kubelet logs on the node (e.g: sudo journalctl -r -u kubelet)
  • Any relevant kernel output lines from dmesg

Environment:

  • HAMi version: v2.5.0

  • nvidia driver or other AI device driver version: corex-4.1.3

  • Docker version from docker version

    • docker version
      Client: Docker Engine - Community
      Version: 26.1.3
      API version: 1.45
      Go version: go1.21.10
      Git commit: b72abbb
      Built: Thu May 16 08:34:00 2024
      OS/Arch: linux/arm64
      Context: default

      Server: Docker Engine - Community
      Engine:
      Version: 26.1.3
      API version: 1.45 (minimum version 1.24)
      Go version: go1.21.10
      Git commit: 8e96db1
      Built: Thu May 16 08:33:12 2024
      OS/Arch: linux/arm64
      Experimental: false
      containerd:
      Version: 1.6.32
      GitCommit: 8b3b7ca2e5ce38e8f31a34f35b2b68ceb8470d89
      runc:
      Version: 1.1.12
      GitCommit: v1.1.12-0-g51d5e94
      docker-init:
      Version: 0.19.0
      GitCommit: de40ad0

  • Docker command, image and tag used

    • projecthami/hami:v2.5.0
    • liangjw/kube-webhook-certgen:v1.1.1
  • Kernel version from uname -a

    • Linux master0 4.19.90-89.11.v2401.ky10.aarch64 Devmem #1 SMP Thu Apr 25 18:20:10 CST 2024 aarch64 aarch64 aarch64 GNU/Linux
  • Others:

    • arm platform. Iluvatar gpu-manager is deployed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions