This guide covers common issues when using the DPF HCP Provisioner Operator and how to diagnose them.
kubectl logs -n dpf-hcp-provisioner-system deployment/dpf-hcp-provisioner-operator -fFor more verbose output, redeploy with logLevel: debug in Helm values.
# Quick status overview
kubectl get dpfhcp -n <namespace>
# Detailed conditions
kubectl describe dpfhcp <name> -n <namespace>
# JSON output for scripting
kubectl get dpfhcp <name> -n <namespace> -o jsonpath='{.status.conditions}' | jq .Condition: DPUClusterMissing = True
Symptoms: CR stays in Pending phase.
Cause: The dpuClusterRef points to a DPUCluster that doesn't exist or is in a different namespace than specified.
Resolution:
- Verify the DPUCluster exists:
kubectl get dpucluster -n <namespace> - Ensure the
nameandnamespaceindpuClusterRefmatch exactly - Check RBAC -- the operator needs
getpermissions on DPUCluster resources in the target namespace
Condition: ClusterTypeValid = False
Symptoms: CR stays in Pending phase.
Cause: The referenced DPUCluster is not of a supported type for HyperShift provisioning.
Resolution: Ensure the DPUCluster is configured with a type compatible with HyperShift-based provisioning.
Condition: DPUClusterInUse = True
Symptoms: CR stays in Pending phase.
Cause: Another DPFHCPProvisioner is already using the same DPUCluster. Each DPUCluster can only be referenced by one DPFHCPProvisioner (1:1 mapping).
Resolution: Use a different DPUCluster or delete the existing DPFHCPProvisioner that references it.
Condition: SecretsValid = False
Symptoms: CR stays in Pending phase.
Cause: SSH key or pull secret is missing, or has incorrect format.
Resolution:
- Verify the SSH key secret exists and contains the
id_rsa.pubkey:kubectl get secret <ssh-secret-name> -n <provisioner-namespace> -o jsonpath='{.data}' | jq 'keys'
- Verify the pull secret exists and contains
.dockerconfigjson:kubectl get secret <pull-secret-name> -n <provisioner-namespace> -o jsonpath='{.data}' | jq 'keys'
- Ensure both secrets are in the same namespace as the DPFHCPProvisioner CR
Condition: BlueFieldOCPLayerImageFound = False
Symptoms: CR stays in Provisioning phase.
Cause: The operator cannot find a BlueField container image tag matching the OCP version in the configured registry.
Resolution:
- Check the DPFHCPProvisionerConfig for the correct registry:
kubectl get dpfhcpconfig default -o yaml
- Verify the OCP version extracted from
ocpReleaseImagehas a matching tag in the BlueField OCP layer image repository - If the BlueField OCP layer image is known, set
machineOSURLin the DPFHCPProvisioner spec to skip the lookup
Conditions: HostedClusterAvailable = False, HostedClusterProgressing = True
Symptoms: CR stays in Provisioning phase for an extended period.
Resolution:
- Check the HostedCluster directly:
kubectl get hostedcluster -A kubectl describe hostedcluster <name> -n <namespace>
- Look for issues with:
- The OCP release image (invalid or inaccessible)
- etcd storage (storage class not available)
- Node scheduling (control plane nodes not available or have resource pressure)
- Network connectivity (DNS resolution, API accessibility)
- Check HyperShift operator logs for more details
Condition: MetalLBConfigured = False
Symptoms: Services are not accessible via LoadBalancer.
Resolution:
- Verify MetalLB operator is installed and running
- Check that the
virtualIPin the spec is a valid, routable IP on the management cluster network - Inspect MetalLB resources:
kubectl get ipaddresspool -A kubectl get l2advertisement -A
Condition: CSRAutoApprovalActive = False
Symptoms: DPU worker nodes cannot join the hosted cluster.
Cause: The operator cannot connect to the hosted cluster to watch for CSRs.
Resolution:
- Ensure the HostedCluster is in
Readystate - Check the hosted cluster is reachable from the management cluster
- Look for
CSRApprovalcontroller errors in operator logs:kubectl logs -n dpf-hcp-provisioner-system deployment/dpf-hcp-provisioner-operator | grep -i csr
Condition: IgnitionConfigured = False
Symptoms: CR stays in IgnitionGenerating phase.
Resolution:
- Verify the
dpuDeploymentRefpoints to a valid DPUDeployment - Check that the DPUDeployment has the required DPU flavor information
- If using a custom
machineOSURL, verify the URL is accessible - Check operator logs for ignition generation errors
Deletion triggers finalizer cleanup that removes the HostedCluster, MetalLB resources, and injected kubeconfig in order. If the CR is stuck in Deleting phase:
- Check conditions for cleanup status:
kubectl get dpfhcp <name> -n <namespace> -o jsonpath='{.status.conditions}' | jq '.[] | select(.type=="HostedClusterCleanup")'
- Check if the HostedCluster is being deleted:
kubectl get hostedcluster -A
- If the HostedCluster itself is stuck deleting, investigate HyperShift operator logs
The operator emits Kubernetes events on the DPFHCPProvisioner CR for key lifecycle actions. View them with:
kubectl get events -n <namespace> --field-selector involvedObject.name=<dpfhcp-name>