Skip to content

Stuck at HAMI-core Initializing..... forever - pod schedule success #1010

Open
@hasenbam

Description

@hasenbam

What happened:
The HAMi device plugin detects my A30.
I start a pod with 1 gpu resource, the scheduler correctly assigns the GPU to the pod.
The pod starts and runs. When any GPU workload or nvidia-smi is triggered, the pod ist stuck at:

[HAMI-core Msg(672:140535013861184:libvgpu.c:837)]: Initializing.....

forever.

This is how I specified the GPU in my pod manifest:
resources: limits: nvidia.com/gpu: 1

What you expected to happen:
I expect that nvidia-smi and compute workloads are working inside the container that got a GPU assigned by HAMi.

How to reproduce it (as minimally and precisely as possible):

  • A30 GPU, Compute Mode: Default, MIG Disabled
  • containerd 2.0.4 (default runtime nvidia in config.toml)
  • Kubernetes 1.32.3
  • nvidia driver 570.86.15
  • nvidia-container-toolkit 1.17.5
  • cuda-toolkit 12.8
  1. Start a pod with resources nvidia.com/gpu: 1.
  2. Exec into the pod with kubectl exec -it pod/cuda-gpu-test -- /bin/bash
  3. Run nvidia-smi
  4. Stuck at [HAMI-core Msg(672:140535013861184:libvgpu.c:837)]: Initializing.....

Anything else we need to know?:

Environment:

  • HAMi version: v2.5.0
  • nvidia driver or other AI device driver version: 570.86.15
  • Docker version from docker version => using containerd 2.0.4
  • Docker command, image and tag used
  • Kernel version from uname -a:
    Linux k8s-worker 5.15.0-136-generic #147-Ubuntu SMP Sat Mar 15 15:53:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
  • Others: See on "How to reproduce"

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions