fix: prevent infinite loop when HostedCluster status not populated#10
Conversation
WalkthroughThe changes refactor the DPFHCPBridge controller to adopt ownership-based reconciliation for HostedCluster resources instead of label-based mapping, and modify status synchronization to rely on event-driven watches rather than timed requeues when HostedCluster status is unavailable. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/controller/dpfhcpbridge_controller.go (1)
285-295: Fix goconst lint failure: extract repeated ConfigMap name/namespace.Lint reports
ocp-bluefield-imagesrepeated; extract constants and reuse.♻️ Proposed fix
+const ( + ocpBluefieldImagesConfigMap = "ocp-bluefield-images" + dpfHcpBridgeSystemNamespace = "dpf-hcp-bridge-system" +)- return e.Object.GetName() == "ocp-bluefield-images" && - e.Object.GetNamespace() == "dpf-hcp-bridge-system" + return e.Object.GetName() == ocpBluefieldImagesConfigMap && + e.Object.GetNamespace() == dpfHcpBridgeSystemNamespace ... - return e.ObjectNew.GetName() == "ocp-bluefield-images" && - e.ObjectNew.GetNamespace() == "dpf-hcp-bridge-system" + return e.ObjectNew.GetName() == ocpBluefieldImagesConfigMap && + e.ObjectNew.GetNamespace() == dpfHcpBridgeSystemNamespace ... - return e.Object.GetName() == "ocp-bluefield-images" && - e.Object.GetNamespace() == "dpf-hcp-bridge-system" + return e.Object.GetName() == ocpBluefieldImagesConfigMap && + e.Object.GetNamespace() == dpfHcpBridgeSystemNamespace
🤖 Fix all issues with AI agents
In `@internal/controller/dpfhcpbridge_controller.go`:
- Around line 204-221: The code silently ignores any non-nil error returned by
r.Get when trying to fetch the HostedCluster (hc, hcKey, r.Get) which can drop
reconciliation; update the logic in the Reconcile handler (where cr, hc and
hcKey are used) to distinguish errors.IsNotFound from other errors and handle
transient failures by logging the error and returning it (or returning a requeue
result) instead of ignoring it, so that RBAC/API/network errors cause a retry
while still proceeding to set cr.Status.HostedClusterRef when the HostedCluster
is found and owned.
| // Set hostedClusterRef if HostedCluster exists and is owned by this CR | ||
| // This ensures the ref is always set when the HostedCluster exists, regardless of phase | ||
| if cr.Status.HostedClusterRef == nil { | ||
| hc := &hyperv1.HostedCluster{} | ||
| hcKey := types.NamespacedName{Name: cr.Name, Namespace: cr.Namespace} | ||
| if err := r.Get(ctx, hcKey, hc); err == nil { | ||
| // HostedCluster exists - verify ownership and set ref | ||
| if metav1.IsControlledBy(hc, &cr) { | ||
| log.V(1).Info("Setting hostedClusterRef for existing HostedCluster") | ||
| cr.Status.HostedClusterRef = &corev1.ObjectReference{ | ||
| Name: cr.Name, | ||
| Namespace: cr.Namespace, | ||
| Kind: "HostedCluster", | ||
| APIVersion: "hypershift.openshift.io/v1beta1", | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Handle HostedCluster Get errors to avoid dropping reconciliation.
Right now any non‑nil error from r.Get is silently ignored, which can leave HostedClusterRef unset indefinitely if no further events fire. Consider returning the error (or at least logging + requeue) for transient API/RBAC/network failures.
🐛 Proposed fix
+ if err := r.Get(ctx, hcKey, hc); err != nil {
+ if apierrors.IsNotFound(err) {
+ // HostedCluster not found yet; skip
+ } else {
+ log.Error(err, "Failed to get HostedCluster for hostedClusterRef", "hostedCluster", hcKey.String())
+ return ctrl.Result{}, err
+ }
+ } else {
+ // HostedCluster exists - verify ownership and set ref
+ if metav1.IsControlledBy(hc, &cr) {
+ log.V(1).Info("Setting hostedClusterRef for existing HostedCluster")
+ cr.Status.HostedClusterRef = &corev1.ObjectReference{
+ Name: cr.Name,
+ Namespace: cr.Namespace,
+ Kind: "HostedCluster",
+ APIVersion: "hypershift.openshift.io/v1beta1",
+ }
+ }
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Set hostedClusterRef if HostedCluster exists and is owned by this CR | |
| // This ensures the ref is always set when the HostedCluster exists, regardless of phase | |
| if cr.Status.HostedClusterRef == nil { | |
| hc := &hyperv1.HostedCluster{} | |
| hcKey := types.NamespacedName{Name: cr.Name, Namespace: cr.Namespace} | |
| if err := r.Get(ctx, hcKey, hc); err == nil { | |
| // HostedCluster exists - verify ownership and set ref | |
| if metav1.IsControlledBy(hc, &cr) { | |
| log.V(1).Info("Setting hostedClusterRef for existing HostedCluster") | |
| cr.Status.HostedClusterRef = &corev1.ObjectReference{ | |
| Name: cr.Name, | |
| Namespace: cr.Namespace, | |
| Kind: "HostedCluster", | |
| APIVersion: "hypershift.openshift.io/v1beta1", | |
| } | |
| } | |
| } | |
| } | |
| // Set hostedClusterRef if HostedCluster exists and is owned by this CR | |
| // This ensures the ref is always set when the HostedCluster exists, regardless of phase | |
| if cr.Status.HostedClusterRef == nil { | |
| hc := &hyperv1.HostedCluster{} | |
| hcKey := types.NamespacedName{Name: cr.Name, Namespace: cr.Namespace} | |
| if err := r.Get(ctx, hcKey, hc); err != nil { | |
| if apierrors.IsNotFound(err) { | |
| // HostedCluster not found yet; skip | |
| } else { | |
| log.Error(err, "Failed to get HostedCluster for hostedClusterRef", "hostedCluster", hcKey.String()) | |
| return ctrl.Result{}, err | |
| } | |
| } else { | |
| // HostedCluster exists - verify ownership and set ref | |
| if metav1.IsControlledBy(hc, &cr) { | |
| log.V(1).Info("Setting hostedClusterRef for existing HostedCluster") | |
| cr.Status.HostedClusterRef = &corev1.ObjectReference{ | |
| Name: cr.Name, | |
| Namespace: cr.Namespace, | |
| Kind: "HostedCluster", | |
| APIVersion: "hypershift.openshift.io/v1beta1", | |
| } | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
In `@internal/controller/dpfhcpbridge_controller.go` around lines 204 - 221, The
code silently ignores any non-nil error returned by r.Get when trying to fetch
the HostedCluster (hc, hcKey, r.Get) which can drop reconciliation; update the
logic in the Reconcile handler (where cr, hc and hcKey are used) to distinguish
errors.IsNotFound from other errors and handle transient failures by logging the
error and returning it (or returning a requeue result) instead of ignoring it,
so that RBAC/API/network errors cause a retry while still proceeding to set
cr.Status.HostedClusterRef when the HostedCluster is found and owned.
There was a problem hiding this comment.
will be handled in NVIDIA-493
Moves hostedClusterRef assignment outside phase-gated block and changes StatusSync to skip instead of requeue when HC status is empty.
NOTE: This bug discovered in testing (minikube with mock CRs) but represents a real timing issue during HostedCluster initialization that could occur in real env.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.