Worker Provisioning

Add physical servers as OpenShift worker nodes using automated bare metal provisioning with NVIDIA DPUs.

Quick Start

1. Prerequisites

Before starting: Ensure you have a running OpenShift DPF cluster

Hardware Requirements:

Physical servers with BMC/iDRAC access
NVIDIA BlueField-3 DPUs installed
Network connectivity from automation host to BMC interfaces

Verify connectivity:

# Test BMC access
ping 192.168.1.101
curl -k https://192.168.1.101/redfish/v1/

Supported BMCs: Dell iDRAC, HPE iLO, Supermicro (auto-detected)

2. Configure Workers

Add to your .env file:

# Number of workers to provision
WORKER_COUNT=2

# Worker 1 (replace with your actual values)
WORKER_1_NAME=worker-01                    # Choose unique hostname
WORKER_1_BMC_IP=192.168.1.101             # Your BMC IP address
WORKER_1_BMC_USER=admin                    # Your BMC username (NOT root/calvin!)
WORKER_1_BMC_PASSWORD=your_secure_password # Your BMC password
WORKER_1_BOOT_MAC=aa:bb:cc:dd:ee:01        # MAC of PXE network interface

# Worker 2 (follow same pattern for additional workers)
WORKER_2_NAME=worker-02
WORKER_2_BMC_IP=192.168.1.102
WORKER_2_BMC_USER=admin
WORKER_2_BMC_PASSWORD=your_secure_password
WORKER_2_BOOT_MAC=aa:bb:cc:dd:ee:02

# Security (recommended for production)
AUTO_APPROVE_WORKER_CSR=false

Finding your boot MAC address:

Check BMC web interface → Network → LAN settings
Or use: ip link show on existing node with similar hardware
Usually the first network interface (not BMC interface)

3. Deploy Workers

# Add workers to existing cluster
make add-worker-nodes

# Monitor progress
make worker-status

4. Approve Certificates (if manual approval)

# Check for pending requests
oc get csr | grep Pending

# Approve each worker's certificate
oc adm certificate approve <csr-name>

# Verify nodes joined
oc get nodes

That's it! Your workers are now part of the OpenShift cluster with DPU acceleration.

How It Works

The automation uses automatic hardware detection:

BMC Discovery: Connects to your BMC IP and auto-detects vendor type (Dell/HPE/Supermicro)
Redfish Protocol: Uses standard API to control server power and boot
Network Boot: Servers PXE boot OpenShift worker image from cluster
Auto-Join: Workers automatically request to join cluster via certificates

No vendor-specific configuration needed - it just works.

Configuration Reference

Required Variables (per worker)

Variable	Description	Example
`WORKER_n_NAME`	Unique hostname (n = worker number)	`worker-01`
`WORKER_n_BMC_IP`	BMC management IP address	`192.168.1.101`
`WORKER_n_BMC_USER`	BMC username (use secure credentials)	`admin`
`WORKER_n_BMC_PASSWORD`	BMC password	`your_password`
`WORKER_n_BOOT_MAC`	PXE network interface MAC address	`aa:bb:cc:dd:ee:01`

Optional Variables

Variable	Description	Default
`WORKER_n_ROOT_DEVICE`	Installation disk	`/dev/sda`
`AUTO_APPROVE_WORKER_CSR`	Deploy CronJob to auto-approve host cluster CSRs	`false`

Security Settings

# Production (recommended)
AUTO_APPROVE_WORKER_CSR=false   # Manual approval required

# Lab/Development (less secure)
AUTO_APPROVE_WORKER_CSR=true    # Automatic approval

Monitoring

Check Worker Status

# Overall status
make worker-status

# Detailed BMC status
oc get bmh -n openshift-machine-api

# Node status
oc get nodes

# Certificate requests
oc get csr

Expected Progression

BMC Registration: registering → available
Provisioning: available → provisioning → provisioned
Node Joining: NotReady → Ready (after CSR approval)

Common Issues

BMC Not Reachable

# Test connectivity
ping 192.168.1.101
curl -k https://192.168.1.101/redfish/v1/

# Check credentials (use your actual BMC credentials)
curl -k -u admin:your_password https://192.168.1.101/redfish/v1/

Worker Stuck in "Registering"

# Check BMO operator logs
oc logs -n openshift-machine-api deployment/metal3

# Common causes: wrong credentials, network issues, BMC in maintenance mode

No Certificate Requests Appearing

# Check if worker booted successfully via BMC console
# Verify network connectivity from worker subnet to cluster API
ping <API_VIP>

Node Stuck in "NotReady"

# Check node conditions
oc describe node worker-01

# Usually resolves after CSR approval and brief initialization

Advanced Topics

Adding More Workers

# Update worker count
echo "WORKER_COUNT=3" >> .env

# Add new worker variables
echo "WORKER_3_NAME=worker-03" >> .env
echo "WORKER_3_BMC_IP=192.168.1.103" >> .env
echo "WORKER_3_BMC_USER=root" >> .env
echo "WORKER_3_BMC_PASSWORD=calvin" >> .env
echo "WORKER_3_BOOT_MAC=aa:bb:cc:dd:ee:03" >> .env

# Deploy new worker
make add-worker-nodes

BMC Security Checklist

Change default credentials immediately
Use dedicated management network/VLAN
Enable audit logging where available
Restrict BMC network access

Automatic CSR Approval

⚠️ Security Warning: Only enable in trusted lab environments.

# Host cluster workers (BMH-provisioned)
AUTO_APPROVE_WORKER_CSR=true

When enabled, a CronJob is deployed that automatically approves pending CSRs every minute. Workers join the cluster without manual intervention.

You can also deploy the auto-approver manually at any time:

make deploy-csr-approver

VM-Based Workers (Assisted Installer Day2 Flow)

For lab and development environments without physical BMC-managed servers, you can add worker nodes as libvirt VMs using the Assisted Installer day2 flow.

How It Works

The existing cluster is moved to "day2" mode in Assisted Installer
A day2 ISO is downloaded that contains the worker discovery agent
A libvirt VM is created and booted from this ISO
The VM registers with Assisted Installer as a new host
Installation is started on the host and it joins the cluster as a worker
CSRs are approved (manually or automatically) to complete the join

Quick Start

Add to your .env file:

# Number of worker VMs to create
VM_WORKER_COUNT=1

# Optional: customize worker VM resources
VM_WORKER_RAM=16384
VM_WORKER_VCPUS=8
VM_WORKER_DISK_SIZE1=120
VM_WORKER_DISK_SIZE2=80

# Auto-approve CSRs (recommended for lab use)
AUTO_APPROVE_WORKER_CSR=true

Run the full workflow:

make add-vm-workers

This single command handles the entire lifecycle: creates the day2 cluster, downloads the day2 ISO, creates the VM(s), waits for host registration, starts the installation, and handles CSR approval.

Step-by-Step (Manual)

If you prefer to run each step individually:

# 1. Move cluster to day2 mode
make create-day2-cluster

# 2. Download the day2 ISO
make download-day2-iso

# 3. Create worker VM(s) from the day2 ISO
make create-worker-vms

# 4. Wait for hosts to register and install (monitor in another terminal)
make worker-status

Configuration Reference

Variable	Description	Default
`VM_WORKER_COUNT`	Number of worker VMs to create	`0`
`VM_WORKER_PREFIX`	VM name prefix for workers	`VM_PREFIX-worker`
`VM_WORKER_RAM`	RAM in MB for worker VMs	Same as `RAM`
`VM_WORKER_VCPUS`	vCPUs for worker VMs	Same as `VCPUS`
`VM_WORKER_DISK_SIZE1`	Primary disk size in GB	Same as `DISK_SIZE1`
`VM_WORKER_DISK_SIZE2`	Secondary disk size in GB	Same as `DISK_SIZE2`

Cleanup

# Delete worker VMs only
make delete-worker-vms

# Or delete everything (includes worker VMs)
make clean-all

Next Steps

Complete Deployment: See Getting Started for full cluster setup
DPU Services: Configure accelerated networking with DPU features
Troubleshooting: See Troubleshooting Guide for additional issues

Your workers are now ready for DPU-accelerated workloads on OpenShift.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worker Provisioning

Quick Start

1. Prerequisites

2. Configure Workers

3. Deploy Workers

4. Approve Certificates (if manual approval)

How It Works

Configuration Reference

Required Variables (per worker)

Optional Variables

Security Settings

Monitoring

Check Worker Status

Expected Progression

Common Issues

BMC Not Reachable

Worker Stuck in "Registering"

No Certificate Requests Appearing

Node Stuck in "NotReady"

Advanced Topics

Adding More Workers

BMC Security Checklist

Automatic CSR Approval

VM-Based Workers (Assisted Installer Day2 Flow)

How It Works

Quick Start

Step-by-Step (Manual)

Configuration Reference

Cleanup

Next Steps

FilesExpand file tree

worker-provisioning.md

Latest commit

History

worker-provisioning.md

File metadata and controls

Worker Provisioning

Quick Start

1. Prerequisites

2. Configure Workers

3. Deploy Workers

4. Approve Certificates (if manual approval)

How It Works

Configuration Reference

Required Variables (per worker)

Optional Variables

Security Settings

Monitoring

Check Worker Status

Expected Progression

Common Issues

BMC Not Reachable

Worker Stuck in "Registering"

No Certificate Requests Appearing

Node Stuck in "NotReady"

Advanced Topics

Adding More Workers

BMC Security Checklist

Automatic CSR Approval

VM-Based Workers (Assisted Installer Day2 Flow)

How It Works

Quick Start

Step-by-Step (Manual)

Configuration Reference

Cleanup

Next Steps