This document provides hardware specifications, supported OS images, onboarding verification, sample benchmarks, and best-practices for OCI deployments using the NVIDIA H100 GPU shape.
BM.GPU.H100.8 is a high-bandwidth NVIDIA H100 bare-metal shape intended for large-scale AI and HPC workloads, with eight 80 GB GPUs, dual Intel Xeon Platinum 8480+ processors, and RoCE-capable scale-out networking.
- Shape:
BM.GPU.H100.8 - GPU configuration:
8 x NVIDIA H100 80 GB - Recommended OS baseline:
Oracle Linux 8 or 9,Ubuntu Linux 22.04, orUbuntu Linux 24.04 - Recommended software baseline:
DOCA OFED 3.2.1, NVIDIA Driver 580/590 (Open), CUDA 13.0/13.1 - Primary verification command:
nvidia-smi - Operational profile:
scale-out AI and HPC with NCCL topology considerations
- Hardware Specifications
- Recommended Operating Systems
- Performance Benchmarks
- OKE GPU Getting Started
- Troubleshooting
- Further Reading & Support
| Shape Name | GPU Model | GPUs/Node | GPU Memory (GB/GPU) | GPU Memory Total | CPU | # of CPUs | System Memory | Local Storage | Host NIC | RDMA (ROCe) NICs |
|---|---|---|---|---|---|---|---|---|---|---|
| BM.GPU.H100.8 | H100 | 8 | 80 | 640 GB | 2 x Intel Xeon Platinum 8480+ @ 2.0 GHz | 112 Cores | 2 TB DDR5 | 16 x 3.5 TB NVMe (~54 TB usable) | 100 Gb/s | 8 x 2 x 200 Gb/s = 3.2 Tb/s |
See the OCI Compute Shapes Docs for up-to-date details.
- Oracle Linux 8
- Oracle Linux 9
- Ubuntu Linux 22.04
- Ubuntu Linux 24.04
- DOCA OFED 3.2.1
- NVIDIA Driver 580 or 590 (Open)
- CUDA 13.0 or 13.1
- Oracle Cloud Agent 1.57.0
- Use the Provided Images table below for the current validated OCI image combinations
To build your images using packer clone the OCI HPC Images repo and run the commands found there OCI HPC Images GitHub Repo.
| OS Version | Image Packer Build Details | OCI Platform Image Link | Driver Versions | Build & Dependency Status |
|---|---|---|---|---|
| OCI GPU AI Image with Ubuntu Linux 22.04 | Canonical-Ubuntu-22.04-DOCA-OFED-3.2.1-GPU-580-OPEN-CUDA-13.0 |
PAR Link | NVIDIA OPEN 580, DOCA OFED 3.2.1, CUDA 13.0, OCA 1.57.0 | |
| OCI GPU AI Image with Ubuntu Linux 22.04 | Canonical-Ubuntu-22.04-DOCA-OFED-3.2.1-GPU-590-OPEN-CUDA-13.1 |
PAR Link | NVIDIA OPEN 590, DOCA OFED 3.2.1, CUDA 13.1, OCA 1.57.0 | |
| OCI GPU AI Image with Ubuntu Linux 24.04 | Canonical-Ubuntu-24.04-6.8-DOCA-OFED-3.2.1-GPU-580-OPEN-CUDA-13.0 |
PAR Link | NVIDIA OPEN 580, DOCA OFED 3.2.1, CUDA 13.0, Kernel 6.8, OCA 1.57.0 | |
| OCI GPU AI Image with Ubuntu Linux 24.04 | Canonical-Ubuntu-24.04-6.8-DOCA-OFED-3.2.1-GPU-590-OPEN-CUDA-13.1 |
PAR Link | NVIDIA OPEN 590, DOCA OFED 3.2.1, CUDA 13.1, Kernel 6.8, OCA 1.57.0 | |
| OCI GPU AI Image with Ubuntu Linux 24.04 | Canonical-Ubuntu-24.04-6.14-DOCA-OFED-3.2.1-GPU-590-OPEN-CUDA-13.1 |
PAR Link | NVIDIA OPEN 590, DOCA OFED 3.2.1, CUDA 13.1, Kernel 6.14, OCA 1.57.0 | |
| OCI GPU AI Image with Oracle Linux 8 | Oracle-Linux-8.10-RHCK-DOCA-OFED-3.2.1-GPU-580-OPEN-CUDA-13.0 |
PAR Link | NVIDIA OPEN 580, DOCA OFED 3.2.1, CUDA 13.0, OCA 1.57.0 | |
| OCI GPU AI Image with Oracle Linux 8 | Oracle-Linux-8.10-RHCK-DOCA-OFED-3.2.1-GPU-590-OPEN-CUDA-13.1 |
PAR Link | NVIDIA OPEN 590, DOCA OFED 3.2.1, CUDA 13.1, OCA 1.57.0 | |
| OCI GPU AI Image with Oracle Linux 9 | Oracle-Linux-9.7-RHCK-DOCA-OFED-3.2.1-GPU-580-OPEN-CUDA-13.0 |
PAR Link | NVIDIA OPEN 580, DOCA OFED 3.2.1, CUDA 13.0, OCA 1.57.0 | |
| OCI GPU AI Image with Oracle Linux 9 | Oracle-Linux-9.7-RHCK-DOCA-OFED-3.2.1-GPU-590-OPEN-CUDA-13.1 |
PAR Link | NVIDIA OPEN 590, DOCA OFED 3.2.1, CUDA 13.1, OCA 1.57.0 |
Run nvidia-smi to verify that all eight GPUs are visible and healthy:
nvidia-smiYou should see all eight H100 GPUs listed with a healthy driver stack and no obvious ECC errors.
NVIDIA publishes NCCL as the primary collective communication library for multi-GPU AI and HPC workloads. The source material for H100 focuses on representative single-node and multi-node NCCL workflows plus the supporting topology and network guidance needed for scale-out runs.
./build/all_reduce_perf -b 8 -e 8G -f 2 -g 8For H100 multi-node NCCL jobs, the source material calls out a required topology file. If it is not already present in the image, use one of the following:
- Bare metal: H100 topology file
- OKE: H100 OKE topology file
For guidance on running additional NCCL collective benchmarks on this GPU family, see the NCCL user guide.
The H100 source page in this workflow is oriented more toward system validation and NCCL guidance than a clean standalone inference table, so no normalized model-performance table is included here.
Information on getting up and running on OKE can be found here.
Useful H100-specific OKE starting points in oci-hpc-oke:
This guide includes a broad health-check set covering GPU visibility, NUMA topology, RDMA connectivity, DCGM diagnostics, PCIe bandwidth, and NVLink validation.
nvidia-sminumactl --hardwareFor H100, the source material expects a layout comparable to a 112-core dual-socket system, or fewer visible cores when hyperthreading is disabled.
rdma linkThe source material identifies the front-end network as ens1200 on mlx5_2, with RoCE RDMA interfaces expected to report ACTIVE and LINK_UP.
dcgmi diag -r 1
dcgmi diag -r 2
dcgmi diag -r 3The source material describes:
r1as a quick metadata and deployment checkr2as a medium-depth integration and hardware checkr3as a fuller stress-oriented validation pass
Additional validation tools referenced in the source material:
bandwidthTestfrom NVIDIA cuda-samples for PCIe bandwidthnvbandwidthfor NVLink bandwidth validation
Additional references: