Description
What happened: Install HAMi v2.5.0 in arm64 environment with iluvatar GPU devices, device plugin container fails to start.
What you expected to happen: HAMi is installed successfully.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
- The output of
nvidia-smi -a
on your host- This is not NVIDIA GPUs.
- Your docker or containerd configuration file (e.g:
/etc/docker/daemon.json
) - The hami-device-plugin container logs
- The hami-scheduler container logs:
- The kubelet logs on the node (e.g:
sudo journalctl -r -u kubelet
) - Any relevant kernel output lines from
dmesg
Environment:
-
HAMi version: v2.5.0
-
nvidia driver or other AI device driver version: corex-4.1.3
-
Docker version from
docker version
-
docker version
Client: Docker Engine - Community
Version: 26.1.3
API version: 1.45
Go version: go1.21.10
Git commit: b72abbb
Built: Thu May 16 08:34:00 2024
OS/Arch: linux/arm64
Context: defaultServer: Docker Engine - Community
Engine:
Version: 26.1.3
API version: 1.45 (minimum version 1.24)
Go version: go1.21.10
Git commit: 8e96db1
Built: Thu May 16 08:33:12 2024
OS/Arch: linux/arm64
Experimental: false
containerd:
Version: 1.6.32
GitCommit: 8b3b7ca2e5ce38e8f31a34f35b2b68ceb8470d89
runc:
Version: 1.1.12
GitCommit: v1.1.12-0-g51d5e94
docker-init:
Version: 0.19.0
GitCommit: de40ad0
-
-
Docker command, image and tag used
- projecthami/hami:v2.5.0
- liangjw/kube-webhook-certgen:v1.1.1
-
Kernel version from
uname -a
- Linux master0 4.19.90-89.11.v2401.ky10.aarch64 Devmem #1 SMP Thu Apr 25 18:20:10 CST 2024 aarch64 aarch64 aarch64 GNU/Linux
-
Others:
- arm platform. Iluvatar gpu-manager is deployed.