Topograph provides two variations of InfiniBand provider. Both discover the IB fabric switch tree using ibnetdiscover, which is useful for any cluster — CPU-only, mixed, or GPU-accelerated — where topology-aware scheduling across an InfiniBand fabric improves workload performance. NVLink domain discovery is an additional capability that applies only to nodes with NVLink-connected NVIDIA GPUs.
Why automate IB discovery? Hand-maintaining IB topology — a static topology.conf or a set of hand-applied node labels — is feasible at ~32 nodes with a stable network and a careful operator. It does not scale. At 1,000 nodes with InfiniBand fabric churn, NVLink partitions shifting with tenant allocation, and a constant background rate of link degradation and node cycling, manual maintenance becomes the dominant source of scheduling misplacement. Topograph keeps topology data current as the cluster changes, removing that burden.
The choice of which to use depends on the specifics of the deployment environment:
- Use
infiniband-bmfor bare-metal clusters (e.g. Slurm) - Use
infiniband-k8sfor Kubernetes clusters
If NetQ is deployed in your environment, consider using the NetQ provider instead — it discovers topology via the NetQ management API rather than directly from the fabric, which avoids node access requirements and is the standard approach for Spectrum-X environments.
For Multi-Node NVLink (MNNVL) Kubernetes clusters (e.g. GB200 NVL72), use the DRA provider instead — it reads nvidia.com/gpu.clique labels set by the GPU Operator's DRA driver and is the Kubernetes-native integration path for MNNVL topology.
infiniband-bm |
infiniband-k8s |
|
|---|---|---|
| Auth | None | In-cluster service account |
| Node access | pdsh (SSH-based) |
Kubernetes pod exec |
| NVLink clique source | nvidia-smi via pdsh |
Node annotations (set by node-data-broker), or a configured Kubernetes node label |
| Target environment | Bare-metal / Slurm | Kubernetes |
Both variants are presently single-region only (multi-region requests return a 400 Bad Request error). No CSP credentials are required.
Both variants produce the same topology representation, and are in turn consumed by whichever engine you configure:
- Slurm engine (
engine: slurm) — writes atopology.conffile describing the switch tree, used by the Slurm topology plugin for topology-aware scheduling - Kubernetes engine (
engine: k8s) — appliesnetwork.topology.nvidia.com/labels to nodes reflecting their position in the switch hierarchy and (where applicable) their NVLink domain - Slinky engine (
engine: slinky) — writes topology data to a Kubernetes ConfigMap for Slurm-on-Kubernetes deployments
See the engine documentation (docs/engines/) for details on each output format.
pdshmust be installed on the node running Topograph and able to reach at least one node per IB fabric segment — Topograph discovers the full fabric from a single entry point per segment, so every node does not need to be reachable via pdshibnetdiscovermust be available on cluster nodes (invoked viapdshwithsudo) — part of the standardinfiniband-diagspackage (dnf install infiniband-diags/apt install infiniband-diags), expected to already be present on any properly configured IB system- NVIDIA GPU driver required on nodes with NVLink-connected GPUs — used to collect NVLink clique IDs via
nvidia-smi. Nodes without NVLink are included in the IB switch tree but excluded from block topology.
- Runs
sudo ibnetdiscoverviapdshon one node per IB fabric segment to map the full switch tree - On NVIDIA GPU nodes: runs
nvidia-smi -q | grep "ClusterUUID\|CliqueId" | sort -uviapdshacross all nodes to collect NVLink clique IDs. The resultingacceleratorlabel value isClusterUUID.CliqueId— the same format asnvidia.com/gpu.cliqueset by the GPU Operator device plugin on MNNVL systems. - Combines the switch tree and any NVLink clique data into the topology graph
No credentials or parameters are required. Set provider: infiniband-bm in your Topograph config:
http:
port: 49021
ssl: false
provider: infiniband-bm
engine: slurmAfter triggering topology generation, query the result endpoint:
id=$(curl -s -X POST -H "Content-Type: application/json" -d @payload.json http://localhost:49021/v1/generate)
curl -s "http://localhost:49021/v1/topology?uid=$id"For the Slurm engine, verify the generated topology.conf reflects the expected switch hierarchy. See the Slurm engine documentation for details.
- Topograph deployed via Helm — the node-data-broker DaemonSet (a Topograph subchart, enabled by default) collects NVLink clique IDs from each node and stores them as Kubernetes node annotations (
topograph.nvidia.com/cluster-id). IfuseGpuCliqueLabelis enabled, Topograph readsnvidia.com/gpu.cliquedirectly instead and the node-data-broker skips NVLink clique collection. - NVIDIA GPU Operator — standard on NVIDIA GPU Kubernetes clusters; manages the device plugin DaemonSet used to read NVLink clique IDs. Required only for NVLink domain discovery; on clusters without NVLink-connected GPUs this does not apply and the provider will still discover the IB switch tree.
- Runs
ibnetdiscoverby exec-ing into a node-data-broker pod on each node to map the switch tree - On NVIDIA GPU nodes: reads NVLink clique IDs from the
topograph.nvidia.com/cluster-idnode annotations set by the node-data-broker. IfuseGpuCliqueLabelis enabled, it readsnvidia.com/gpu.cliquedirectly instead. The accelerator domain value isClusterUUID.CliqueId— the same format asnvidia.com/gpu.cliqueset by the GPU Operator device plugin on MNNVL systems. When the k8s engine seesnvidia.com/gpu.cliquealready present on a node, it does not write a duplicate Topograph accelerator label for that node. - Combines the switch tree and any NVLink clique data into the topology graph
No credentials are required. The provider uses the in-cluster service account automatically.
Set provider: infiniband-k8s in your Topograph config:
http:
port: 49021
ssl: false
provider: infiniband-k8s
engine: k8sThe following optional parameter can be passed in the topology request payload:
| Parameter | Type | Default | Description |
|---|---|---|---|
nodeSelector |
map[string]string |
— | Label selector to filter which nodes participate in topology discovery |
useGpuCliqueLabel |
bool |
false |
Use nvidia.com/gpu.clique as the accelerator-domain ID source instead of the topograph.nvidia.com/cluster-id annotation. |
With Helm, configure useGpuCliqueLabel under global.provider.params. The chart also passes it to the node-data-broker init container so it skips NVLink clique collection instead of exec-ing into the GPU Operator device-plugin DaemonSet to run nvidia-smi:
global:
provider:
name: infiniband-k8s
params:
useGpuCliqueLabel: true
engine:
name: k8sWhen useGpuCliqueLabel is not set, the node-data-broker init container uses the GPU Operator device-plugin DaemonSet as before. To override the GPU Operator namespace or device plugin DaemonSet name (defaults: gpu-operator and nvidia-device-plugin-daemonset), set these via node-data-broker.initc.extraArgs in your Helm values — they are init container arguments, not provider request parameters:
node-data-broker:
initc:
extraArgs:
- gpu-operator-namespace=my-namespace
- device-plugin-daemonset=my-daemonsetExample request payload with nodeSelector:
{
"provider": {
"name": "infiniband-k8s",
"params": {
"nodeSelector": {
"nvidia.com/gpu.present": "true"
}
}
},
"engine": {
"name": "k8s"
}
}After topology generation, inspect the node labels applied by Topograph:
kubectl get nodes -o json | jq '.items[].metadata.labels | with_entries(select(.key | startswith("network.topology.nvidia.com")))'See the Kubernetes engine documentation for details on the label schema.