Operator does not re-try failed upgrades

### Component(s)

collector

### What happened?

## Description
The operator does not re-try failed upgrades of managed instances. In case an upgrade fails here: https://github.com/open-telemetry/opentelemetry-operator/blob/42a689e93bdeabded2ef4af08a9b5edc9b6e6486/pkg/collector/upgrade/upgrade.go#L86 (for example the Kubernetes API server is temporarily unreachable), an error is printed to the log, and the `status.version` field of the instance is updated  in the reconcile loop here: https://github.com/open-telemetry/opentelemetry-operator/blob/42a689e93bdeabded2ef4af08a9b5edc9b6e6486/internal/status/collector/handle.go#L55-L69 to the latest version regardless (note, the `spec` is not updated, only the `status` subresource). Therefore, any future re-starts of the operator also won't attempt to upgrade this instance.

Related, if the collector instance is moved from `unmanaged` to `managed` state, the upgrade process also doesn't run.

## Expected Result
The upgrade is re-tried.

## Actual Result
The `status.version` field of the instance is updated as part of the reconcile loop, however the `spec` field didn't get upgraded.

## Possible Solutions
Perform the upgrade process in the reconcile loop instead of the operator startup. This resolves the issue of re-trying failed upgrades, and also upgrading instances when they are moved from `unmanaged` to `managed` state.

### Kubernetes Version

1.31.0

### Operator version

0.113.0

### Collector version

0.113.0

### Environment information

_No response_

### Log output

_No response_

### Additional context

_No response_

	upgraded, upgradeErr := up.ManagedInstance(ctx, *changed)
	if upgradeErr != nil {
	// don't fail to allow setting the status
	log.V(2).Error(upgradeErr, "failed to upgrade the OpenTelemetry CR")
	}
	changed = &upgraded
	statusErr := UpdateCollectorStatus(ctx, params.Client, changed)
	if statusErr != nil {
	params.Recorder.Event(changed, eventTypeWarning, reasonStatusFailure, statusErr.Error())
	return ctrl.Result{}, statusErr
	}
	statusPatch := client.MergeFrom(&otelcol)
	if err := params.Client.Status().Patch(ctx, changed, statusPatch); err != nil {
	return ctrl.Result{}, fmt.Errorf("failed to apply status changes to the OpenTelemetry CR: %w", err)
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Operator does not re-try failed upgrades #3515

Component(s)

What happened?

Description

Expected Result

Actual Result

Possible Solutions

Kubernetes Version

Operator version

Collector version

Environment information

Log output

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Operator does not re-try failed upgrades #3515

Description

Component(s)

What happened?

Description

Expected Result

Actual Result

Possible Solutions

Kubernetes Version

Operator version

Collector version

Environment information

Log output

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions