Verifying AMD GPU availability on your cluster

Before you proceed with the AMD GPU Operator installation process, you can verify the presence of an AMD GPU device on a node within your {openshift-platform} cluster. You can use commands such as lspci or oc to confirm hardware and resource availability.

Prerequisites

You have administrative access to the {openshift-platform} cluster.
You have a running {openshift-platform} cluster with a node equipped with an AMD GPU.
You have access to the OpenShift CLI (oc) and terminal access to the node.

Procedure

Use the OpenShift CLI to verify if GPU resources are allocatable:
1. List all nodes in the cluster to identify the node with an AMD GPU:
  oc get nodes
2. Note the name of the node where you expect the AMD GPU to be present.
3. Describe the node to check its resource allocation:
  oc describe node <node_name>
4. In the output, locate the Capacity and Allocatable sections and confirm that amd.com/gpu is listed. For example:
  Capacity: amd.com/gpu: 1 Allocatable: amd.com/gpu: 1
Check for the AMD GPU device using the lspci command:
1. Log in to the node:
  oc debug node/<node_name> chroot /host
2. Run the lspci command and search for the supported AMD device in your deployment. For example:
  lspci | grep -E "MI210|MI250|MI300"
3. Verify that the output includes one of the AMD GPU models. For example:
  03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD] Instinct MI210
Optional: Use the rocminfo command if the ROCm stack is installed on the node:
```
rocminfo
```
1. Confirm that the ROCm tool outputs details about the AMD GPU, such as compute units, memory, and driver status.

Verification

The oc describe node <node_name> command lists amd.com/gpu under Capacity and Allocatable.
The lspci command output identifies an AMD GPU as a PCI device matching one of the specified models (for example, MI210, MI250, MI300).
Optional: The rocminfo tool provides detailed GPU information, confirming driver and hardware configuration.

Additional resources

AMD GPU Operator GitHub Repository

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

verifying-amd-gpu-availability-on-your-cluster.adoc

verifying-amd-gpu-availability-on-your-cluster.adoc

Verifying AMD GPU availability on your cluster

Files

verifying-amd-gpu-availability-on-your-cluster.adoc

Latest commit

History

verifying-amd-gpu-availability-on-your-cluster.adoc

File metadata and controls

Verifying AMD GPU availability on your cluster