Skip to content

Operator does not re-try failed upgrades #3515

Open
@andreasgerstmayr

Description

Component(s)

collector

What happened?

Description

The operator does not re-try failed upgrades of managed instances. In case an upgrade fails here:

itemLogger.Error(err, "failed to apply changes to instance")
(for example the Kubernetes API server is temporarily unreachable), an error is printed to the log, and the status.version field of the instance is updated in the reconcile loop here:
upgraded, upgradeErr := up.ManagedInstance(ctx, *changed)
if upgradeErr != nil {
// don't fail to allow setting the status
log.V(2).Error(upgradeErr, "failed to upgrade the OpenTelemetry CR")
}
changed = &upgraded
statusErr := UpdateCollectorStatus(ctx, params.Client, changed)
if statusErr != nil {
params.Recorder.Event(changed, eventTypeWarning, reasonStatusFailure, statusErr.Error())
return ctrl.Result{}, statusErr
}
statusPatch := client.MergeFrom(&otelcol)
if err := params.Client.Status().Patch(ctx, changed, statusPatch); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to apply status changes to the OpenTelemetry CR: %w", err)
}
to the latest version regardless (note, the spec is not updated, only the status subresource). Therefore, any future re-starts of the operator also won't attempt to upgrade this instance.

Related, if the collector instance is moved from unmanaged to managed state, the upgrade process also doesn't run.

Expected Result

The upgrade is re-tried.

Actual Result

The status.version field of the instance is updated as part of the reconcile loop, however the spec field didn't get upgraded.

Possible Solutions

Perform the upgrade process in the reconcile loop instead of the operator startup. This resolves the issue of re-trying failed upgrades, and also upgrading instances when they are moved from unmanaged to managed state.

Kubernetes Version

1.31.0

Operator version

0.113.0

Collector version

0.113.0

Environment information

No response

Log output

No response

Additional context

No response

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions