Skip to content

Latest commit

 

History

History
66 lines (60 loc) · 2.33 KB

verifying-amd-gpu-availability-on-your-cluster.adoc

File metadata and controls

66 lines (60 loc) · 2.33 KB

Verifying AMD GPU availability on your cluster

Before you proceed with the AMD GPU Operator installation process, you can verify the presence of an AMD GPU device on a node within your {openshift-platform} cluster. You can use commands such as lspci or oc to confirm hardware and resource availability.

Prerequisites
  • You have administrative access to the {openshift-platform} cluster.

  • You have a running {openshift-platform} cluster with a node equipped with an AMD GPU.

  • You have access to the OpenShift CLI (oc) and terminal access to the node.

Procedure
  1. Use the OpenShift CLI to verify if GPU resources are allocatable:

    1. List all nodes in the cluster to identify the node with an AMD GPU:

      oc get nodes
    2. Note the name of the node where you expect the AMD GPU to be present.

    3. Describe the node to check its resource allocation:

      oc describe node <node_name>
    4. In the output, locate the Capacity and Allocatable sections and confirm that amd.com/gpu is listed. For example:

      Capacity:
        amd.com/gpu:  1
      Allocatable:
        amd.com/gpu:  1
  2. Check for the AMD GPU device using the lspci command:

    1. Log in to the node:

      oc debug node/<node_name>
      chroot /host
    2. Run the lspci command and search for the supported AMD device in your deployment. For example:

      lspci | grep -E "MI210|MI250|MI300"
    3. Verify that the output includes one of the AMD GPU models. For example:

      03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD] Instinct MI210
  3. Optional: Use the rocminfo command if the ROCm stack is installed on the node:

    rocminfo
    1. Confirm that the ROCm tool outputs details about the AMD GPU, such as compute units, memory, and driver status.

Verification
  • The oc describe node <node_name> command lists amd.com/gpu under Capacity and Allocatable.

  • The lspci command output identifies an AMD GPU as a PCI device matching one of the specified models (for example, MI210, MI250, MI300).

  • Optional: The rocminfo tool provides detailed GPU information, confirming driver and hardware configuration.

Additional resources