OCI GPU Quick Start: NVIDIA H100

This document provides hardware specifications, supported OS images, onboarding verification, sample benchmarks, and best-practices for OCI deployments using the NVIDIA H100 GPU shape.

BM.GPU.H100.8 is a high-bandwidth NVIDIA H100 bare-metal shape intended for large-scale AI and HPC workloads, with eight 80 GB GPUs, dual Intel Xeon Platinum 8480+ processors, and RoCE-capable scale-out networking.

At a Glance

Shape: BM.GPU.H100.8
GPU configuration: 8 x NVIDIA H100 80 GB
Recommended OS baseline: Oracle Linux 8 or 9, Ubuntu Linux 22.04, or Ubuntu Linux 24.04
Recommended software baseline: DOCA OFED 3.2.1, NVIDIA Driver 580/590 (Open), CUDA 13.0/13.1
Primary verification command: nvidia-smi
Operational profile: scale-out AI and HPC with NCCL topology considerations

Hardware Specifications
Recommended Operating Systems
Performance Benchmarks
OKE GPU Getting Started
Troubleshooting
Further Reading & Support

Hardware Specifications

Shape Name	GPU Model	GPUs/Node	GPU Memory (GB/GPU)	GPU Memory Total	CPU	# of CPUs	System Memory	Local Storage	Host NIC	RDMA (ROCe) NICs
BM.GPU.H100.8	H100	8	80	640 GB	2 x Intel Xeon Platinum 8480+ @ 2.0 GHz	112 Cores	2 TB DDR5	16 x 3.5 TB NVMe (~54 TB usable)	100 Gb/s	8 x 2 x 200 Gb/s = 3.2 Tb/s

See the OCI Compute Shapes Docs for up-to-date details.

Recommended Operating Systems

Oracle Linux 8
Oracle Linux 9
Ubuntu Linux 22.04
Ubuntu Linux 24.04

Recommended Software Version

DOCA OFED 3.2.1
NVIDIA Driver 580 or 590 (Open)
CUDA 13.0 or 13.1
Oracle Cloud Agent 1.57.0
Use the Provided Images table below for the current validated OCI image combinations

Custom OS Image Creation with Packer

To build your images using packer clone the OCI HPC Images repo and run the commands found there OCI HPC Images GitHub Repo.

Provided Images

OS Version	Image Packer Build Details	OCI Platform Image Link	Driver Versions
OCI GPU AI Image with Ubuntu Linux 22.04	`Canonical-Ubuntu-22.04-DOCA-OFED-3.2.1-GPU-580-OPEN-CUDA-13.0`	PAR Link	NVIDIA OPEN 580, DOCA OFED 3.2.1, CUDA 13.0, OCA 1.57.0
OCI GPU AI Image with Ubuntu Linux 22.04	`Canonical-Ubuntu-22.04-DOCA-OFED-3.2.1-GPU-590-OPEN-CUDA-13.1`	PAR Link	NVIDIA OPEN 590, DOCA OFED 3.2.1, CUDA 13.1, OCA 1.57.0
OCI GPU AI Image with Ubuntu Linux 24.04	`Canonical-Ubuntu-24.04-6.8-DOCA-OFED-3.2.1-GPU-580-OPEN-CUDA-13.0`	PAR Link	NVIDIA OPEN 580, DOCA OFED 3.2.1, CUDA 13.0, Kernel 6.8, OCA 1.57.0
OCI GPU AI Image with Ubuntu Linux 24.04	`Canonical-Ubuntu-24.04-6.8-DOCA-OFED-3.2.1-GPU-590-OPEN-CUDA-13.1`	PAR Link	NVIDIA OPEN 590, DOCA OFED 3.2.1, CUDA 13.1, Kernel 6.8, OCA 1.57.0
OCI GPU AI Image with Ubuntu Linux 24.04	`Canonical-Ubuntu-24.04-6.14-DOCA-OFED-3.2.1-GPU-590-OPEN-CUDA-13.1`	PAR Link	NVIDIA OPEN 590, DOCA OFED 3.2.1, CUDA 13.1, Kernel 6.14, OCA 1.57.0
OCI GPU AI Image with Oracle Linux 8	`Oracle-Linux-8.10-RHCK-DOCA-OFED-3.2.1-GPU-580-OPEN-CUDA-13.0`	PAR Link	NVIDIA OPEN 580, DOCA OFED 3.2.1, CUDA 13.0, OCA 1.57.0
OCI GPU AI Image with Oracle Linux 8	`Oracle-Linux-8.10-RHCK-DOCA-OFED-3.2.1-GPU-590-OPEN-CUDA-13.1`	PAR Link	NVIDIA OPEN 590, DOCA OFED 3.2.1, CUDA 13.1, OCA 1.57.0
OCI GPU AI Image with Oracle Linux 9	`Oracle-Linux-9.7-RHCK-DOCA-OFED-3.2.1-GPU-580-OPEN-CUDA-13.0`	PAR Link	NVIDIA OPEN 580, DOCA OFED 3.2.1, CUDA 13.0, OCA 1.57.0
OCI GPU AI Image with Oracle Linux 9	`Oracle-Linux-9.7-RHCK-DOCA-OFED-3.2.1-GPU-590-OPEN-CUDA-13.1`	PAR Link	NVIDIA OPEN 590, DOCA OFED 3.2.1, CUDA 13.1, OCA 1.57.0

Hello World Verification

Run nvidia-smi to verify that all eight GPUs are visible and healthy:

nvidia-smi

You should see all eight H100 GPUs listed with a healthy driver stack and no obvious ECC errors.

Performance Benchmarks

NVIDIA publishes NCCL as the primary collective communication library for multi-GPU AI and HPC workloads. The source material for H100 focuses on representative single-node and multi-node NCCL workflows plus the supporting topology and network guidance needed for scale-out runs.

All Reduce - Single Node
Multi-node Guidance
Model Inference Performance

All Reduce - Single Node

./build/all_reduce_perf -b 8 -e 8G -f 2 -g 8

Multi-node Guidance

For H100 multi-node NCCL jobs, the source material calls out a required topology file. If it is not already present in the image, use one of the following:

Bare metal: H100 topology file
OKE: H100 OKE topology file

For guidance on running additional NCCL collective benchmarks on this GPU family, see the NCCL user guide.

Model Inference Performance

The H100 source page in this workflow is oriented more toward system validation and NCCL guidance than a clean standalone inference table, so no normalized model-performance table is included here.

OKE GPU Getting Started

Information on getting up and running on OKE can be found here.

Useful H100-specific OKE starting points in oci-hpc-oke:

Troubleshooting

This guide includes a broad health-check set covering GPU visibility, NUMA topology, RDMA connectivity, DCGM diagnostics, PCIe bandwidth, and NVLink validation.

GPU Visibility
NUMA Layout
RDMA Interface State
DCGM Diagnostics
PCIe and NVLink Validation

GPU Visibility

nvidia-smi

NUMA Layout

numactl --hardware

For H100, the source material expects a layout comparable to a 112-core dual-socket system, or fewer visible cores when hyperthreading is disabled.

RDMA Interface State

rdma link

The source material identifies the front-end network as ens1200 on mlx5_2, with RoCE RDMA interfaces expected to report ACTIVE and LINK_UP.

DCGM Diagnostics

dcgmi diag -r 1
dcgmi diag -r 2
dcgmi diag -r 3

The source material describes:

r1 as a quick metadata and deployment check
r2 as a medium-depth integration and hardware check
r3 as a fuller stress-oriented validation pass

PCIe and NVLink Validation

Additional validation tools referenced in the source material:

bandwidthTest from NVIDIA cuda-samples for PCIe bandwidth
nvbandwidth for NVLink bandwidth validation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCI GPU Quick Start: NVIDIA H100

At a Glance

Table of Contents

Hardware Specifications

Recommended Operating Systems

Recommended Software Version

Custom OS Image Creation with Packer

Provided Images

Hello World Verification

Performance Benchmarks

All Reduce - Single Node

Multi-node Guidance

Model Inference Performance

OKE GPU Getting Started

Troubleshooting

GPU Visibility

NUMA Layout

RDMA Interface State

DCGM Diagnostics

PCIe and NVLink Validation

Further Reading & Support

FilesExpand file tree

README-H100.md

Latest commit

History

README-H100.md

File metadata and controls

OCI GPU Quick Start: NVIDIA H100

At a Glance

Table of Contents

Hardware Specifications

Recommended Operating Systems

Recommended Software Version

Custom OS Image Creation with Packer

Provided Images

Hello World Verification

Performance Benchmarks

All Reduce - Single Node

Multi-node Guidance

Model Inference Performance

OKE GPU Getting Started

Troubleshooting

GPU Visibility

NUMA Layout

RDMA Interface State

DCGM Diagnostics

PCIe and NVLink Validation

Further Reading & Support