Skip to content

NVIDIA/k8s-launch-kit

Repository files navigation

K8s Launch Kit - CLI for configuring NVIDIA cloud-native solutions

K8s Launch Kit (l8k) is a CLI tool for deploying and managing NVIDIA cloud-native solutions on Kubernetes. The tool helps provide flexible deployment workflows for optimal network performance with SR-IOV, RDMA, and other networking technologies.

Operation Phases

Discover Cluster Configuration

Deploy a minimal Network Operator profile to automatically discover your cluster's network capabilities and hardware configuration. This phase can be skipped if you provide your own configuration file.

Select the Deployment Profile

Specify the desired deployment profile via CLI flags or with the natural language prompt for the LLM.

Generate Deployment Files

Based on the discovered/provided configuration, generate a complete set of YAML deployment files tailored to your selected network profile.

Installation

Build from source

git clone <repository-url>
cd launch-kubernetes
make build

The binary will be available at build/l8k.

Docker

Build the Docker image:

make docker-build

Usage


K8s Launch Kit (l8k) is a CLI tool for deploying and managing NVIDIA cloud-native solutions on Kubernetes. The tool helps provide flexible deployment workflows for optimal network performance with SR-IOV, RDMA, and other networking technologies.

### Discover Cluster Configuration
Deploy a minimal Network Operator profile to automatically discover your cluster's
network capabilities and hardware configuration by using --discover-cluster-config.
This phase can be skipped if you provide your own configuration file by using --user-config.
This phase requires --kubeconfig to be specified.

### Generate Deployment Files
Based on the discovered or provided configuration, 
generate a complete set of YAML deployment files for the selected network profile. 
Files can be saved to disk using --save-deployment-files.
The profile can be defined manually with --fabric, --deployment-type and --multirail flags,
OR generated by an LLM-assisted profile generator with --prompt (requires --llm-api-key and --llm-vendor).

### Deploy to Cluster
Apply the generated deployment files to your Kubernetes cluster by using --deploy. This phase requires --kubeconfig and can be skipped if --deploy is not specified.

Usage:
  l8k [flags]
  l8k [command]

Available Commands:
  completion  Generate the autocompletion script for the specified shell
  help        Help about any command
  version     Print the version number

Flags:
      --ai                                Enable AI deployment
      --deploy                            Deploy the generated files to the Kubernetes cluster
      --deployment-type string            Select the deployment type (sriov, rdma_shared, host_device)
      --discover-cluster-config           Deploy a thin Network Operator profile to discover cluster capabilities
      --enabled-plugins string            Comma-separated list of plugins to enable (default "network-operator")
      --fabric string                     Select the fabric type to deploy (infiniband, ethernet)
      --group string                      Generate templates for a specific group only (e.g., group-0)
  -h, --help                              help for l8k
      --kubeconfig string                 Path to kubeconfig file for cluster deployment (required when using --deploy)
      --label-selector string             Filter nodes for discovery by label (default "feature.node.kubernetes.io/pci-15b3.present=true")
      --llm-api-key string                API key for the LLM API (required when using --prompt)
      --llm-api-url string                API URL for the LLM API
      --llm-interactive                   Enable interactive chat mode for LLM-assisted profile selection
      --llm-model string                  Model name for the LLM API (e.g., claude-3-5-sonnet-20241022, gpt-4)
      --llm-vendor string                 Vendor of the LLM API: openai, openai-azure, anthropic, gemini (default "openai-azure")
      --log-file string                   Write logs to file instead of stderr
      --log-level string                  Enable logging at specified level (debug, info, warn, error)
      --multiplane-mode string            Spectrum-X multiplane mode: swplb, hwplb, uniplane (requires --spectrum-x)
      --multirail                         Enable multirail deployment
      --network-operator-namespace string Override the network operator namespace from the config file
      --number-of-planes int              Number of planes for Spectrum-X (requires --spectrum-x)
      --prompt string                     Path to file with a prompt to use for LLM-assisted profile generation
      --save-cluster-config string        Save discovered cluster configuration to the specified path (defaults to --user-config path if set, otherwise /opt/nvidia/k8s-launch-kit/cluster-config.yaml)
      --save-deployment-files string      Save generated deployment files to the specified directory (default "/opt/nvidia/k8s-launch-kit/deployment")
      --spcx-version string               Spectrum-X firmware version (requires --spectrum-x)
      --spectrum-x                        Enable Spectrum X deployment
      --user-config string                Use provided cluster configuration file (as base config for discovery or as full config without discovery)

Use "l8k [command] --help" for more information about a command.

Usage Examples

Complete Workflow

Discover cluster config, generate files, and deploy:

l8k --discover-cluster-config --save-cluster-config ./cluster-config.yaml \
    --fabric ethernet --deployment-type sriov --multirail \
    --save-deployment-files ./deployments \
    --deploy --kubeconfig ~/.kube/config

Discover Cluster Configuration

l8k --discover-cluster-config --save-cluster-config ./my-cluster-config.yaml \
    --kubeconfig ~/.kube/config

Filter discovery to specific nodes using a label selector:

l8k --discover-cluster-config --save-cluster-config ./my-cluster-config.yaml \
    --label-selector "feature.node.kubernetes.io/pci-15b3.present=true" \
    --kubeconfig ~/.kube/config

Discovery with User-Provided Base Config

Use your own config file (with custom network operator version, subnets, etc.) as the base for discovery. Without --save-cluster-config, the file is rewritten in place with discovery results:

l8k --user-config ./my-config.yaml --discover-cluster-config \
    --kubeconfig ~/.kube/config

Save discovery results to a separate file instead:

l8k --user-config ./my-config.yaml --discover-cluster-config \
    --save-cluster-config ./discovered-config.yaml \
    --kubeconfig ~/.kube/config

Use Existing Configuration

Generate and deploy with pre-existing config:

l8k --user-config ./existing-config.yaml \
    --fabric ethernet --deployment-type sriov --multirail \
    --deploy --kubeconfig ~/.kube/config

Generate Deployment Files

l8k --user-config ./config.yaml \
    --fabric ethernet --deployment-type sriov --multirail \
    --save-deployment-files ./deployments

Generate Deployment Files for a Specific Node Group

In heterogeneous clusters, discovery produces multiple node groups. Use --group to generate manifests for a single group:

l8k --user-config ./config.yaml \
    --fabric infiniband --deployment-type sriov --multirail \
    --group group-0 \
    --save-deployment-files ./deployments

Generate Deployment Files using Natural Language Prompt

echo "I want to enable multirail networking in my AI cluster" > requirements.txt
l8k --user-config ./config.yaml \
    --prompt requirements.txt --llm-vendor openai-azure --llm-api-key <OPENAI_AZURE_KEY> \
    --save-deployment-files ./deployments

Troubleshooting Network Operator Issues

The interactive mode can also help troubleshoot NVIDIA Network Operator failures by collecting and analyzing diagnostic data (sosreport). The sosreport script must first be downloaded:

make download-sosreport

Then use the interactive mode with --kubeconfig to enable troubleshooting:

l8k --llm-interactive \
    --kubeconfig ~/.kube/config \
    --user-config ./cluster-config.yaml \
    --llm-api-key $KEY --llm-vendor anthropic \
    --llm-model claude-sonnet-4-20250514

In the session, ask about issues: "My OFED driver pods are crashing, can you investigate?"

The AI agent will automatically collect a sosreport from the cluster, examine the diagnostic data, and provide analysis with remediation steps.

You can also provide a pre-collected sosreport directory (no cluster access needed):

l8k --llm-interactive \
    --sosreport-path ./network-operator-sosreport-20260306-120000 \
    --llm-api-key $KEY --llm-vendor anthropic \
    --llm-model claude-sonnet-4-20250514

Configuration file

During cluster discovery stage, Kubernetes Launch Kit creates a configuration file, which it later uses to generate deployment manifests from the templates. This config file can be edited by the user to customize their deployment configuration. The user can provide the custom config file to the tool using the --user-config cli flag — either as a standalone config (skipping discovery) or as a base config combined with --discover-cluster-config (discovery takes network operator parameters from the file and adds discovered cluster config).

DOCA Driver

The docaDriver section controls the OFED driver deployment in the NicClusterPolicy. Set enable: true to include the ofedDriver section in generated manifests, or enable: false to omit it. This can also be overridden via the --enable-doca-driver CLI flag.

OFED Dependent Module Blacklisting

When the DOCA/OFED driver loads on a node, it replaces the inbox MLX kernel modules (mlx5_core, mlx5_ib, ib_core, etc.) with its own versions. However, if third-party or distribution-specific kernel modules depend on the inbox MLX modules (e.g., iw_cm, nfsrdma), they will block the inbox modules from being unloaded, causing the DOCA driver to fail to load or leaving the system in an inconsistent state.

To solve this, blacklistDependentModules: true enables a pre-flight check during cluster discovery. The tool execs into nic-configuration-daemon pods and inspects /sys/module/*/holders/ for each of the following MLX/OFED kernel modules:

mlx5_core, mlx5_ib, ib_umad, ib_uverbs, ib_ipoib, rdma_cm, rdma_ucm, ib_core, ib_cm

Any kernel modules found as holders (dependents) of these — but not the MLX modules themselves — are saved per group as ofedDependentModules. During manifest generation, these modules are passed to the DOCA driver pod via the OFED_BLACKLIST_MODULES environment variable (semicolon-separated), which tells the driver to unload them before attempting to replace the inbox modules.

Module discovery always runs during cluster discovery (so results are saved for inspection), but the OFED_BLACKLIST_MODULES env var is only rendered when blacklistDependentModules is true. When multiple node groups are merged, their dependent modules are aggregated as a union.

docaDriver:
  enable: true
  version: doca3.3.0-26.01-1.0.0.0-0
  unloadStorageModules: true
  enableNFSRDMA: false
  blacklistDependentModules: true   # Enable dependent module discovery and blacklisting

After discovery, the config will contain the discovered dependents:

clusterConfig:
- identifier: group-0
  ofedDependentModules:
  - iw_cm

The generated NicClusterPolicy ofedDriver section will include:

env:
  - name: OFED_BLACKLIST_MODULES
    value: "iw_cm"

NV-IPAM Subnet Configuration

The nvIpam section supports two modes for subnet configuration:

Option 1: Manual subnet list — List each subnet explicitly. This takes precedence if the list is non-empty:

nvIpam:
  poolName: nv-ipam-pool
  subnets:
  - subnet: 192.168.2.0/24
    gateway: 192.168.2.1
  - subnet: 192.168.3.0/24
    gateway: 192.168.3.1

Option 2: Auto-generate subnets — When the subnets list is empty but startingSubnet, mask, and offset are all set, subnets are automatically generated. Each cluster config group gets its own unique, non-overlapping subnet slice. The gateway for each subnet is the first usable address (network + 1).

nvIpam:
  poolName: nv-ipam-pool
  startingSubnet: "192.168.2.0"
  mask: 24
  offset: 1

With the auto-generation example above, a cluster with 2 groups (4 east-west PFs each) would receive:

  • Group 0: 192.168.2.0/24, 192.168.3.0/24, 192.168.4.0/24, 192.168.5.0/24
  • Group 1: 192.168.6.0/24, 192.168.7.0/24, 192.168.8.0/24, 192.168.9.0/24

The offset parameter controls how many subnet blocks to skip between consecutive subnets (offset=1 is contiguous, offset=2 skips every other).

Example of the configuration file discovered from the cluster:

networkOperator:
  version: v26.1.0
  componentVersion: network-operator-v26.1.0
  repository: nvcr.io/nvidia/mellanox
  namespace: nvidia-network-operator
docaDriver:
  enable: true
  version: doca3.2.0-25.10-1.2.8.0-2
  unloadStorageModules: false
  enableNFSRDMA: false
  blacklistDependentModules: false
nvIpam:
  poolName: nv-ipam-pool
  subnets:
  - subnet: 192.168.2.0/24
    gateway: 192.168.2.1
  - subnet: 192.168.3.0/24
    gateway: 192.168.3.1
  - subnet: 192.168.4.0/24
    gateway: 192.168.4.1
  - subnet: 192.168.5.0/24
    gateway: 192.168.5.1
  - subnet: 192.168.6.0/24
    gateway: 192.168.6.1
  - subnet: 192.168.7.0/24
    gateway: 192.168.7.1
  - subnet: 192.168.8.0/24
    gateway: 192.168.8.1
  - subnet: 192.168.9.0/24
    gateway: 192.168.9.1
  - subnet: 192.168.10.0/24
    gateway: 192.168.10.1
  - subnet: 192.168.11.0/24
    gateway: 192.168.11.1
  - subnet: 192.168.12.0/24
    gateway: 192.168.12.1
  - subnet: 192.168.13.0/24
    gateway: 192.168.13.1
  - subnet: 192.168.14.0/24
    gateway: 192.168.14.1
  - subnet: 192.168.15.0/24
    gateway: 192.168.15.1
  - subnet: 192.168.16.0/24
    gateway: 192.168.16.1
  - subnet: 192.168.17.0/24
    gateway: 192.168.17.1
  - subnet: 192.168.18.0/24
    gateway: 192.168.18.1
  - subnet: 192.168.19.0/24
sriov:
  ethernetMtu: 9000
  infinibandMtu: 4000
  numVfs: 8
  priority: 90
  resourceName: sriov_resource
  networkName: sriov-network
hostdev:
  resourceName: hostdev-resource
  networkName: hostdev-network
rdmaShared:
  resourceName: rdma_shared_resource
  hcaMax: 63
ipoib:
  networkName: ipoib-network
macvlan:
  networkName: macvlan-network
nicConfigurationOperator:
  deployNicInterfaceNameTemplate: true  # Enable NIC rename when needed (see NIC Interface Name Templates section)
  rdmaPrefix: "rdma_r%rail%"           # RDMA device name template (%rail% substituted per rail)
  netdevPrefix: "eth_r%rail%"          # Network interface name template (%rail% substituted per rail)
spectrumX:
  nicType: "1023"
  overlay: none
  rdmaPrefix: roce_p%plane%_r%rail%    # Spectrum-X uses its own prefixes (with %plane%)
  netdevPrefix: eth_p%plane%_r%rail%
clusterConfig:
- identifier: group-0
  capabilities:
    nodes:
      sriov: true
      rdma: true
      ib: true
  pfs:
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: "0000:19:00.0"
    networkInterface: ""
    traffic: east-west
    rail: 0
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:2a:00.0
    networkInterface: ""
    traffic: east-west
    rail: 1
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:3b:00.0
    networkInterface: ""
    traffic: east-west
    rail: 2
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:4c:00.0
    networkInterface: ""
    traffic: east-west
    rail: 3
  - deviceID: 101f
    rdmaDevice: ""
    pciAddress: 0000:5a:00.0
    networkInterface: ""
    traffic: east-west
    rail: 4
  - deviceID: 101f
    rdmaDevice: ""
    pciAddress: 0000:5a:00.1
    networkInterface: ""
    traffic: east-west
    rail: 5
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:9b:00.0
    networkInterface: ""
    traffic: east-west
    rail: 6
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:ab:00.0
    networkInterface: ""
    traffic: east-west
    rail: 7
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:c1:00.0
    networkInterface: ""
    traffic: east-west
    rail: 8
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:cb:00.0
    networkInterface: ""
    traffic: east-west
    rail: 9
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:d8:00.0
    networkInterface: ""
    traffic: east-west
    rail: 10
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:d8:00.1
    networkInterface: ""
    traffic: east-west
    rail: 11
  workerNodes:
  - pdx-g22r13-2894-lh2-w01
  - pdx-g24r13-2894-lh2-w02
  nodeSelector:
    nvidia.com/gpu.machine: ThinkSystem-SR680a-V3
- identifier: group-1
  capabilities:
    nodes:
      sriov: true
      rdma: true
      ib: true
  pfs:
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:1a:00.0
    networkInterface: ""
    traffic: east-west
    rail: 0
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:3c:00.0
    networkInterface: ""
    traffic: east-west
    rail: 1
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:4d:00.0
    networkInterface: ""
    traffic: east-west
    rail: 2
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:5e:00.0
    networkInterface: ""
    traffic: east-west
    rail: 3
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:9c:00.0
    networkInterface: ""
    traffic: east-west
    rail: 4
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:9d:00.0
    networkInterface: ""
    traffic: east-west
    rail: 5
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:9d:00.1
    networkInterface: ""
    traffic: east-west
    rail: 6
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:bc:00.0
    networkInterface: ""
    traffic: east-west
    rail: 7
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:cc:00.0
    networkInterface: ""
    traffic: east-west
    rail: 8
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:dc:00.0
    networkInterface: ""
    traffic: east-west
    rail: 9
  workerNodes:
  - pdx-g22r23-2894-dh2-w03
  - pdx-g24r23-2894-dh2-w04
  nodeSelector:
    nvidia.com/gpu.machine: PowerEdge-XE9680
- identifier: group-2
  capabilities:
    nodes:
      sriov: true
      rdma: true
      ib: true
  pfs:
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: "0000:09:00.0"
    networkInterface: ""
    traffic: east-west
    rail: 0
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: "0000:23:00.0"
    networkInterface: ""
    traffic: east-west
    rail: 1
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: "0000:35:00.0"
    networkInterface: ""
    traffic: east-west
    rail: 2
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: "0000:35:00.1"
    networkInterface: ""
    traffic: east-west
    rail: 3
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: "0000:53:00.0"
    networkInterface: ""
    traffic: east-west
    rail: 4
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:69:00.0
    networkInterface: ""
    traffic: east-west
    rail: 5
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:8f:00.0
    networkInterface: ""
    traffic: east-west
    rail: 6
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:9c:00.0
    networkInterface: ""
    traffic: east-west
    rail: 7
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:cd:00.0
    networkInterface: ""
    traffic: east-west
    rail: 8
  - deviceID: a2dc
    rdmaDevice: ""
    pciAddress: 0000:f1:00.0
    networkInterface: ""
    traffic: east-west
    rail: 9
  workerNodes:
  - pdx-g22r31-2894-ch2-w05
  - pdx-g24r31-2894-ch2-w06
  nodeSelector:
    nvidia.com/gpu.machine: UCSC-885A-M8-H22

North-South Traffic Detection

During cluster discovery, the tool automatically identifies BlueField DPU devices (as opposed to SuperNICs or ConnectX NICs) by matching each device's partNumber against a known list of DPU product codes in pkg/networkoperatorplugin/ns-product-ids. Devices matching a DPU product code are classified as north-south traffic (management/external), while all other devices are classified as east-west traffic (GPU interconnect).

North-south PFs are included in the saved cluster configuration for visibility, but are automatically filtered out during template rendering so that only east-west PFs appear in the generated manifests. Each east-west PF is assigned a sequential rail number (rail-0, rail-1, rail-2, ...) used for naming resources like SriovNetworkNodePolicy and IPPool entries.

Example of mixed traffic types in the config:

clusterConfig:
- identifier: group-0
  pfs:
  - deviceID: a2dc
    pciAddress: "0000:19:00.0"
    traffic: east-west       # SuperNIC — included in manifests
    rail: 0
  - deviceID: a2dc
    pciAddress: "0000:2a:00.0"
    traffic: east-west
    rail: 1
  - deviceID: a2dc
    pciAddress: "0000:3b:00.0"
    traffic: north-south     # BlueField DPU — excluded from manifests

NIC Interface Name Templates

The nicConfigurationOperator.deployNicInterfaceNameTemplate setting controls whether a NicInterfaceNameTemplate CR is deployed to rename NIC interfaces to predictable, rail-based names (e.g., eth_r0, eth_r1). When set to true, the tool treats it as "enable when needed" rather than "always enable". The NicInterfaceNameTemplate CR and associated nicConfigurationOperator section in NicClusterPolicy are only deployed when one of the following conditions is met:

  1. Merged groups with PCI address conflicts — When multiple node groups share the same GPU product type and are merged into a single group, but the same PCI address appears at different rail positions across groups. In this case PCI addresses alone cannot identify the correct rail, so interface name templates are used instead.

  2. rdma_shared deployment with empty network interface names — When the deployment type is rdma_shared (macvlan-rdma-shared or ipoib-rdma-shared profiles) and PFs have empty networkInterface fields. The rdmaSharedDevicePlugin uses ifNames selectors that require interface names, so NicInterfaceNameTemplate must be enabled to provide them. This typically happens when discovery finds multiple nodes per group and omits device names for safety.

When neither condition holds, name templates are disabled and the device plugin uses PCI addresses directly, avoiding the overhead of deploying the NIC configuration operator.

Docker container

You can run the l8k tool as a docker container:

docker run -v ~/launch-kubernetes/user-prompt:/user-prompt -v ~/remote-cluster/:/remote-cluster -v /tmp:/output --net=host nvcr.io/nvidia/cloud-native/k8s-launch-kit:v26.1.0 --discover-cluster-config --kubeconfig /remote-cluster/kubeconf.yaml --save-cluster-config /output/config.yaml --log-level debug  --save-deployment-files /output --fabric infiniband --deployment-type rdma_shared --multirail

Don't forget to enable --net=host and mount the necessary directories for input and output files with -v.

Development

Building

make build        # Build for current platform
make build-all    # Build for all platforms
make clean        # Clean build artifacts

Testing

make test         # Run tests
make coverage     # Run tests with coverage

Linting

make lint         # Run linter
make lint-check   # Install and run linter

Docker

make docker-build # Build Docker image
make docker-run   # Run Docker container

About

K8s Launch Kit (l8k) is a CLI tool for deploying and managing NVIDIA cloud-native solutions on Kubernetes. The tool helps provide flexible deployment workflows for optimal network performance with SR-IOV, RDMA, and other networking technologies.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors