Description
Component(s)
collector
What happened?
Description
The operator does not re-try failed upgrades of managed instances. In case an upgrade fails here:
(for example the Kubernetes API server is temporarily unreachable), an error is printed to the log, and thestatus.version
field of the instance is updated in the reconcile loop here: opentelemetry-operator/internal/status/collector/handle.go
Lines 55 to 69 in 42a689e
spec
is not updated, only the status
subresource). Therefore, any future re-starts of the operator also won't attempt to upgrade this instance.
Related, if the collector instance is moved from unmanaged
to managed
state, the upgrade process also doesn't run.
Expected Result
The upgrade is re-tried.
Actual Result
The status.version
field of the instance is updated as part of the reconcile loop, however the spec
field didn't get upgraded.
Possible Solutions
Perform the upgrade process in the reconcile loop instead of the operator startup. This resolves the issue of re-trying failed upgrades, and also upgrading instances when they are moved from unmanaged
to managed
state.
Kubernetes Version
1.31.0
Operator version
0.113.0
Collector version
0.113.0
Environment information
No response
Log output
No response
Additional context
No response