using defer for status and logging improvements in ipaddressclaim_controller#496
using defer for status and logging improvements in ipaddressclaim_controller#496
Conversation
386f5a1 to
3ed5559
Compare
| @@ -174,31 +175,24 @@ func (r *IpAddressReconciler) Reconcile(ctx context.Context, req ctrl.Request) ( | |||
|
|
|||
| netboxIpAddressModel, err := r.NetboxClient.ReserveOrUpdateIpAddress(ipAddressModel) | |||
| if err != nil { | |||
There was a problem hiding this comment.
Do we not want to set a conditionMessage for this error return?
There was a problem hiding this comment.
probably yes, or we could fetch the conditionMessage from the error message and set this if conditionMessage=="". What do you think @bruelea ?
There was a problem hiding this comment.
Agree, the error message could be used here. Using the error would be more intuitive than setting the conditionMessage variable.
| if o.Status.SyncState == netboxv1.SyncStateSucceeded { | ||
| debugLogger.Info(fmt.Sprintf("reserved ip address in netbox, ip: %s", o.Spec.IpAddress)) |
There was a problem hiding this comment.
If we reached here, we should have succeeded right? So I think this if statement is redundant
There was a problem hiding this comment.
agreed, will delete it
There was a problem hiding this comment.
There were cases where this line was reached but the ip address was actually not reserved in NetBox. So the function should be refactored to avoid this case.
| }, nil | ||
| } | ||
|
|
||
| func (r *IpAddressReconciler) UpdateConditions(ctx context.Context, o *netboxv1.IpAddress, msg string) error { |
There was a problem hiding this comment.
This is good, but might be a bit more readable using a switch statement e.g.
switch {
case o.Status.IpAddressUrl == "":
// code
case o.Status.SyncState == netboxv1.SyncStateFailed:
// more code
default:
// ready case
}The DeletionTimestamp check still needs to be done, but perhaps it could be inverted and have an early return? As there is only one state to go into if this is the case
| // If err is a StatusError, we've reported it in status conditions, so return nil to controller-runtime | ||
| // This prevents exponential backoff for user-facing errors that are already visible in status | ||
| // Regular errors are still returned to trigger retry with backoff | ||
| if err != nil && IsStatusError(err) { |
There was a problem hiding this comment.
Why not follow the pattern given by controller-runtime's client.IgnoreNotFound(err)?
We could then do err = IgnoreStatusError(err)
There was a problem hiding this comment.
StatusError could be named a little more clearly - it seems to be errors caused when communicating with Netbox? And other errors are errors caused by issues during the reconcile loop, do I understand correctly here?
There was a problem hiding this comment.
I really like the proposal in the first comment!
I agree the naming is bad, well actually this is just to define if the error should in the end be returned by the controller loop, so visible with the stacktrace in the logs or not. But basically you are right, StatusError just means the reconciliation did not fully succed but there was no error in the code. Could be claim error, no ip's left or any other error caught somehwere which is not unexpected but due to some user input or system state.
| // Reconcile is part of the main kubernetes reconciliation loop which aims to | ||
| // move the current state of the cluster closer to the desired state. | ||
| func (r *IpAddressClaimReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { | ||
| func (r *IpAddressClaimReconciler) Reconcile(ctx context.Context, req ctrl.Request) (result ctrl.Result, err error) { |
There was a problem hiding this comment.
Perhaps rename err to resultErr as in this function err is shadowed in some code paths which makes it harder to see if this is what will be returned, or if it's just temporary
| // updateStatus updates the IpAddressClaim status based on the current state of the owned IpAddress. | ||
| // This function is called as a deferred function in Reconcile to ensure status is always updated. | ||
| // It captures any reconcile errors to include them in the status condition message. | ||
| func (r *IpAddressClaimReconciler) updateStatus(ctx context.Context, claim *netboxv1.IpAddressClaim, lookupKey types.NamespacedName, reconcileErr error, debugLogger logr.Logger) error { |
There was a problem hiding this comment.
We should decide if we never update in case of an error in this function (current state). The alternative is to call Update even if we fail setting some of these statuses. Not saying this is wrong, but we should consider which way we want to go with this
There was a problem hiding this comment.
mhm good point, what do you think about adding a second defer function which only calls the Status().Update()? :D
There was a problem hiding this comment.
I checked the code of the EventStatusRecoder again, it Report() always returns nil because the Status().Update() was removed to the updateStatus function and the return variable should be removed too.
The only error here would be if the client fails to the the IpAddress CR, in that case the status conditions should still be updated.
| return fmt.Errorf("failed to report IpAssigned false condition: %w", err) | ||
| } | ||
| } else { | ||
| // No error and no IpAddress - this shouldn't happen in normal flow, skip update |
There was a problem hiding this comment.
Is this case not something we would want to report to the user somehow?
Also, this currently would skip even setting the initial ready condition from the top of this function (in case it wasn't present) making the user not see any progress, unless they have debug logging switched on and checked the operator logs
There was a problem hiding this comment.
Ah these status conditions always become complexe, but good point, I will actually remove this if condition
| } | ||
|
|
||
| // IpAddress exists - report successful IP assignment if not already reported | ||
| if apismeta.FindStatusCondition(claim.Status.Conditions, netboxv1.ConditionIpAssignedTrue.Type) == nil { |
There was a problem hiding this comment.
This only checks for a Status Condition with the type IPAssigned, not checking if it is true - is that the intent?
And what if we don't assign an IP? Don't we want to set AssignedFalse here?
There was a problem hiding this comment.
True this should check if it is true and set to true if not.
So the idea is that if the ipAddress cr does not exist, this is cought further up in the
/ Fetch the latest IpAddress object ipAddress := &netboxv1.IpAddress{} if err := r.Client.Get(ctx, lookupKey, ipAddress); err != nil { if apierrors.IsNotFound(err) {
| } | ||
|
|
||
| // Update status based on IpAddress readiness | ||
| if apismeta.IsStatusConditionTrue(ipAddress.Status.Conditions, "Ready") { |
There was a problem hiding this comment.
Should this use the defined constants as in the other if statements?
There was a problem hiding this comment.
The idea here is to check if the ipAddress CR which was generated from this claim has the ready condition set to true. If so, then the claim can also be set to true. Or am I missing something here
| @@ -174,31 +175,24 @@ func (r *IpAddressReconciler) Reconcile(ctx context.Context, req ctrl.Request) ( | |||
|
|
|||
| netboxIpAddressModel, err := r.NetboxClient.ReserveOrUpdateIpAddress(ipAddressModel) | |||
| if err != nil { | |||
There was a problem hiding this comment.
Agree, the error message could be used here. Using the error would be more intuitive than setting the conditionMessage variable.
| if o.Status.SyncState == netboxv1.SyncStateSucceeded { | ||
| debugLogger.Info(fmt.Sprintf("reserved ip address in netbox, ip: %s", o.Spec.IpAddress)) |
There was a problem hiding this comment.
There were cases where this line was reached but the ip address was actually not reserved in NetBox. So the function should be refactored to avoid this case.
| func (r *IpAddressClaimReconciler) updateStatus(ctx context.Context, claim *netboxv1.IpAddressClaim, lookupKey types.NamespacedName, reconcileErr error, debugLogger logr.Logger) error { | ||
| debugLogger.Info("updating ipaddressclaim status") | ||
|
|
||
| // Initialize status conditions if this is a new resource |
There was a problem hiding this comment.
Can't the initialization be skipped now if the status and Ready Condition is updated with the defer function?
There was a problem hiding this comment.
yes good point, not needed :)
| // updateStatus updates the IpAddressClaim status based on the current state of the owned IpAddress. | ||
| // This function is called as a deferred function in Reconcile to ensure status is always updated. | ||
| // It captures any reconcile errors to include them in the status condition message. | ||
| func (r *IpAddressClaimReconciler) updateStatus(ctx context.Context, claim *netboxv1.IpAddressClaim, lookupKey types.NamespacedName, reconcileErr error, debugLogger logr.Logger) error { |
There was a problem hiding this comment.
I checked the code of the EventStatusRecoder again, it Report() always returns nil because the Status().Update() was removed to the updateStatus function and the return variable should be removed too.
The only error here would be if the client fails to the the IpAddress CR, in that case the status conditions should still be updated.
| // Status updates the IpAddressClaim status based on the current state of the owned IpAddress. | ||
| // This function is called as a deferred function in Reconcile to ensure status is always updated. | ||
| // It captures any reconcile errors to include them in the status condition message. | ||
| func (r *IpAddressClaimReconciler) updateStatus(ctx context.Context, claim *netboxv1.IpAddressClaim, lookupKey types.NamespacedName, reconcileRes ctrl.Result, reconcileErr error, logger logr.Logger) (result ctrl.Result, err error) { |
There was a problem hiding this comment.
Instead of adding the logger as input variable a new logger could be created inside of the function from the context logger := log.FromContext(ctx).
| // It captures any reconcile errors to include them in the status condition message. | ||
| func (r *IpAddressClaimReconciler) updateStatus(ctx context.Context, claim *netboxv1.IpAddressClaim, lookupKey types.NamespacedName, reconcileRes ctrl.Result, reconcileErr error, logger logr.Logger) (result ctrl.Result, err error) { | ||
| // Set default return values | ||
| result = reconcileRes |
There was a problem hiding this comment.
Can this be omitted by using the same variable names for the input and output variables?
There was a problem hiding this comment.
no this does not work sadly, there is a compilation error if the input and output are named the same
Co-authored-by: bruelea <166021996+bruelea@users.noreply.github.com>
bruelea
left a comment
There was a problem hiding this comment.
For the e2e tests we could switch to only test the event type, reason and soon also the action. This should be enough information to check if the expected event was created. If needed there can be an additional check to see if the message contains a certain string like for example the ip address that was claimed.
| return ctrl.Result{}, errors.New("failed to remove the finalizer") | ||
| if !o.Spec.PreserveInNetbox && o.Status.IpAddressId != 0 { | ||
| if err := r.NetboxClient.DeleteIpAddress(o.Status.IpAddressId); err != nil { | ||
| return ctrl.Result{Requeue: true}, NewDomainError("failed to delete ip address from netbox: %w", err) |
There was a problem hiding this comment.
In case there's an error can the responsibility to set the reconcileResult to ctrl.Result{Requeue: true} be move to the r.updateStatus() method? Then it only needs to be set in the code if there should be a specific requeue time, for example if the lease could not be locked.
|
|
||
| // cancelLock stops the lease renewal goroutine on early returns (lease expires naturally). | ||
| // Explicit cancelLock()+UnlockWithRetry() runs inline after the critical section. | ||
| var cancelLock context.CancelFunc |
There was a problem hiding this comment.
can this be moved to line 150?
| if err != nil { | ||
| return ctrl.Result{}, err | ||
| } | ||
| logger.V(4).Info("reserved ip address in netbox", "ip", o.Spec.IpAddress) |
There was a problem hiding this comment.
This line is also reached if the ip address was only updated. Can this log message be removed?
| } | ||
| logger.V(4).Info("reserved ip address in netbox", "ip", o.Spec.IpAddress) | ||
|
|
||
| logger.Info("reconcile loop finished") |
There was a problem hiding this comment.
Should this be part of the defer function too? Otherwise it could also be removed.
| } | ||
|
|
||
| err = r.Get(ctx, ipAddressLookupKey, ipAddress) | ||
| err := r.Get(ctx, ipAddressLookupKey, ipAddress) |
There was a problem hiding this comment.
| err := r.Get(ctx, ipAddressLookupKey, ipAddress) | |
| err = r.Get(ctx, ipAddressLookupKey, ipAddress) |
To avoid shadowing of the error which was created before. There are other locations in the code where this change could be applied too.
| if !locked { | ||
| // lock for parent prefix was not available, rescheduling | ||
| errorMsg := fmt.Sprintf("failed to lock parent prefix %s", o.Spec.ParentPrefix) | ||
| r.EventStatusRecorder.Recorder().Eventf(o, corev1.EventTypeWarning, "FailedToLockParentPrefix", errorMsg) |
There was a problem hiding this comment.
| r.EventStatusRecorder.Recorder().Eventf(o, corev1.EventTypeWarning, "FailedToLockParentPrefix", errorMsg) |
Creating the event should be covered by the defer now.
| netboxv1.ConditionIpaddressReadyTrue, corev1.EventTypeNormal, nil) | ||
| } | ||
|
|
||
| if apierrors.IsConflict(err) { |
There was a problem hiding this comment.
If the update of the annotation was moved to the defer statement too. Could it be updated with the same request to avoid conflicts?
| err := r.NetboxClient.DeleteIpRange(ctx, int32(o.Status.IpRangeId)) | ||
| if err != nil { | ||
| err = r.EventStatusRecorder.Report(ctx, o, netboxv1.ConditionIpRangeReadyFalseDeletionFailed, | ||
| r.EventStatusRecorder.Report(ctx, o, netboxv1.ConditionIpRangeReadyFalseDeletionFailed, |
There was a problem hiding this comment.
| r.EventStatusRecorder.Report(ctx, o, netboxv1.ConditionIpRangeReadyFalseDeletionFailed, |
| if err != nil { | ||
| err = r.EventStatusRecorder.Report(ctx, o, netboxv1.ConditionIpRangeReadyFalseDeletionFailed, | ||
| r.EventStatusRecorder.Report(ctx, o, netboxv1.ConditionIpRangeReadyFalseDeletionFailed, | ||
| corev1.EventTypeWarning, err) |
There was a problem hiding this comment.
| corev1.EventTypeWarning, err) | |
| corev1.EventTypeWarning, err) | |
| if err != nil { | |
| return ctrl.Result{}, NewDomainErr(err) | |
| } |
| r.EventStatusRecorder.Report(ctx, o, | ||
| netboxv1.ConditionIpRangeReadyFalse, corev1.EventTypeWarning, reconcileErr) | ||
| default: | ||
| r.EventStatusRecorder.Report(ctx, o, |
There was a problem hiding this comment.
The iprange could become Ready=false again if the update request fails. If the default case is reached the ready condition can be set to true if the reconcileErr is nil or otherwise to false.
wip