Skip to content

Latest commit

 

History

History
64 lines (45 loc) · 2.34 KB

File metadata and controls

64 lines (45 loc) · 2.34 KB

Install on Slurm (bare metal)

Install Topograph on a Slurm head node so it can generate topology configuration (topology.conf or per-partition topology.yaml) for the Slurm controller to consume.

Prerequisites

  • Slurm cluster with a head node you can install system packages on
  • Go and make to build the package from source (see go.mod for the exact Go version), or a pre-built Debian/RPM package if your organization distributes one
  • A supported provider for your environment — see the provider documentation for per-provider setup

Install

Clone the repo and build a native package for your distribution:

git clone https://github.com/NVIDIA/topograph.git
cd topograph

make deb        # Debian / Ubuntu — produces .deb under dist/
# or
make rpm        # RHEL / Rocky / SUSE — produces .rpm under dist/

Install the resulting package:

sudo dpkg -i dist/topograph_*.deb        # Debian / Ubuntu
# or
sudo rpm -ivh dist/topograph-*.rpm       # RHEL / Rocky / SUSE

The package installs the service but does not start it. Edit /etc/topograph/topograph-config.yaml to set at minimum:

http:
  port: 49021
provider: <provider>                     # aws, gcp, oci, nebius, nscale, netq, infiniband-bm, ...
engine: slurm
requestAggregationDelay: 15s

Then enable and start the service:

sudo systemctl enable --now topograph.service

Verify

Check that the service is running and the API is reachable:

curl http://localhost:49021/healthz

HTTP 200 means the API server is up.

Where to go next