This guide walks you through setting up and running your first OpenShift DPF deployment. Follow these steps in order for a successful deployment.
Your host system needs:
- Operating System: RHEL 8/9, CentOS Stream 8/9, or compatible Linux
- RAM: Minimum 32GB (64GB+ recommended for multi-node)
- CPU: 8+ cores (16+ cores recommended)
- Storage: 200GB+ free disk space
- Network: Reliable internet connection for image downloads
# Check available RAM (need 32GB minimum)
free -h
# Check available disk space (need 200GB minimum)
df -h /
# Check CPU cores (need 8+ cores)
nproc
# Verify virtualization support (if using VMs)
grep -E 'vmx|svm' /proc/cpuinfoThe automation requires several CLI tools. Some can be installed automatically, others need manual installation.
# Clone the repository
git clone <repository-url>
cd openshift-dpf
# Install Helm and Hypershift automatically
make install-helm
make install-hypershift# Install from Red Hat
# Visit: https://console.redhat.com/openshift/install/pull-secret
# Download and follow aicli installation instructions
# Verify installation
aicli --version# Download from Red Hat
curl -O https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/openshift-client-linux.tar.gz
tar xzf openshift-client-linux.tar.gz
sudo mv oc kubectl /usr/local/bin/
# Verify installation
oc version --client# Install system packages (RHEL/CentOS)
sudo dnf install -y jq libvirt-client podman
# For Ubuntu/Debian
sudo apt install -y jq virtinst podman- Visit Red Hat OpenShift Downloads
- Log in with your Red Hat account
- Download the pull secret
- Save as
openshift_pull.jsonin the project directory
- Visit Red Hat API Tokens
- Copy your offline token
- Create the aicli config directory and save the token:
mkdir -p ~/.aicli
echo "YOUR_OFFLINE_TOKEN" > ~/.aicli/offlinetoken.txt
chmod 600 ~/.aicli/offlinetoken.txtTo use a different user's token (e.g. when running as root but using tokens under /root/xyz), set AICLI_HOME in your .env or environment (e.g. export AICLI_HOME=/root/xyz). The automation will use $AICLI_HOME/.aicli/offlinetoken.txt and will check that this file exists when AICLI_HOME is set.
When using AICLI_HOME, your OPENSHIFT_PULL_SECRET file must contain a pull secret for the same Red Hat account as the offline token (otherwise the API returns "pull secret token does not match current user").
- Create account at NVIDIA NGC
- Go to Account → Setup → Generate API Key
- Create NGC pull secret:
cat > pull-secret.txt << 'EOF'
{
"auths": {
"nvcr.io": {
"username": "$oauthtoken",
"password": "YOUR_NGC_API_KEY",
"auth": "BASE64_ENCODED_TOKEN_PAIR"
}
}
}
EOFNote: Replace YOUR_NGC_API_KEY with your actual API key and generate the base64 auth string.
Generate an SSH key for cluster access:
# Generate new SSH key (if you don't have one)
ssh-keygen -t rsa -f ~/.ssh/id_rsa -N ""
# Verify public key exists
ls -la ~/.ssh/id_rsa.pub# Copy the example configuration
cp .env.example .envEdit .env and set these essential variables:
# Cluster Configuration
CLUSTER_NAME=my-dpf-cluster
BASE_DOMAIN=example.com
OPENSHIFT_VERSION=4.20.0
# VM Configuration (adjust based on your resources)
VM_COUNT=3 # 1 for SNO, 3+ for multi-node
RAM=16384 # RAM per VM in MB
VCPUS=8 # vCPUs per VM
DISK_SIZE1=120 # Primary disk in GB
DISK_SIZE2=80 # Secondary disk in GB
# DPF Configuration
DPF_VERSION=v25.7.1 # Use latest stable versionFor multi-node clusters, you need VIP addresses:
# Network Configuration (required for multi-node)
API_VIP=10.1.150.100 # API server VIP
INGRESS_VIP=10.1.150.101 # Ingress VIPImportant: VIP addresses must be:
- In the same network as your host
- Not currently assigned to any device
- Accessible from your host system
Before deploying, verify your setup:
# Check all required files exist
make verify-files
# This should show:
# ✓ openshift_pull.json exists
# ✓ pull-secret.txt exists
# ✓ SSH public key exists
# ✓ .env configuration exists# Configure for SNO
echo "VM_COUNT=1" >> .env
# Deploy everything
make all
# This will take about 2 hours and includes:
# - Creating cluster definition
# - Creating and starting VMs
# - Installing OpenShift
# - Deploying DPF operator
# - Setting up DPU servicesOpen a new terminal and monitor progress:
# Watch cluster installation status
watch 'aicli list clusters'
# Monitor VM status
watch 'virsh list --all'
# Check OpenShift nodes (after cluster is ready)
watch 'oc get nodes'
# Monitor DPF operator deployment
watch 'oc get pods -n dpf-operator-system'After deployment completes, verify everything is working:
# Check cluster nodes
oc get nodes
# Check DPF operator
oc get pods -n dpf-operator-system
# Check hosted cluster (for DPU workloads)
oc get hostedcluster -n clusters
# Run comprehensive health check
make run-dpf-sanityExpected output for a successful deployment:
NAME STATUS ROLES AGE VERSION
vm-dpf-0 Ready control-plane 45m v1.29.0+xxx
vm-dpf-1 Ready control-plane 45m v1.29.0+xxx
vm-dpf-2 Ready control-plane 45m v1.29.0+xxx
Symptom: Errors downloading container images
Solution: Verify both openshift_pull.json and pull-secret.txt are valid JSON
Symptom: "Cannot create VM" errors Solution: Ensure libvirt is running and you have sufficient resources
sudo systemctl start libvirtd
sudo systemctl enable libvirtd
sudo usermod -a -G libvirt $USER
# Log out and back in for group changesSymptom: VMs cannot reach the internet Solution: Check your default network bridge
# Check libvirt network
virsh net-list
virsh net-start default # if not runningSymptom: VMs run slowly or fail to start
Solution: Reduce resource allocation in .env
# For resource-constrained systems
VM_COUNT=1 # Use SNO
RAM=8192 # 8GB per VM
VCPUS=4 # 4 vCPUs per VMOnce your first deployment is successful:
- Learn about configuration: Read Configuration Guide
- Add worker nodes: See Worker Provisioning
- Deploy workloads: Your cluster is ready for applications
- Explore advanced features: Check Advanced Topics
- Troubleshooting: See Troubleshooting Guide
- Issues: Report problems in the repository
- Documentation: This user guide covers common scenarios
Congratulations! You now have a working OpenShift cluster with NVIDIA DPF automation. The cluster is ready to run DPU-accelerated workloads.