This repository provides materials for participants taking part in this tutorial. To take part, you will need to be a registered attendee of the IEEE CLUSTER 2025 conference and completed the registration form.
To adhere to compliance and security policies, participants must take part in person. You will be required to sign-in at the session.
This tutorial has been allocated a finite amount of compute time to be shared between all participants. Users must not run long-running or large-scale jobs either before or during the session.
Welcome & Logistics. Introducing Isambard-AI Richard Gilham (Bristol Centre for Supercomputing)
Practical- Finding your way around Isambard-AI Richard Gilham (Bristol Centre for Supercomputing)
HPE Cray EX4000 platform + NVIDIA 4-way GH200 Superchip: Hardware and Software Deep Dive Tim Dykes (HPE)
NVIDIA GH200 Programming Models Deep Dive Filippo Spiga (NVIDIA)
Practical- Guided hands-on session / “Bring-Your-Own-Code” [single node / single GPU] Filippo Spiga (NVIDIA)
Tips & tricks for multi-node scaling AI workloads on EX4000 and HPE Slingshot Jess Jones (HPE)
Practical- Guided hands-on session / “Bring-Your-Own-Code” [multi node / multi GPU] Jess Jones (HPE)
Profiling Large Language Model Trainings on the Grace Hopper Superchip using Nsight Systems Karin Sevegnani (NVIDIA)
Guided hands-on session / “Bring-Your-Own-Code” [profiling and performance analysis] Karin Sevegnani (NVIDIA)
We welcome your feedback on the session. Please take a few moments to fill in the feedback form.