Fix certificate expiration causing premature upgrade completion #78
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When TLS certificates expire during node version checks, the controller was treating connection failures as "node doesn't need upgrade" instead of propagating the error. This caused upgrades to complete prematurely with 0 nodes upgraded.
Changes
Error handling fix:
nodeNeedsUpgrade()from returningboolto(bool, error)to distinguish between "doesn't need upgrade" and "cannot determine"findNextNode()fromslices.IndexFuncto explicit loop to propagate errorsrefreshTalosClient()logicBehavior change:
Original prompt
This section details on the original issue you should resolve
<issue_title>Failed to get current version from Talos client</issue_title>
<issue_description>Since this has happened now the second time (1.11.6 --> 1.12.0 --> 1.12.1), I thought it might be beneficial to report it here to at least track it as well for others if it is just a configuration issue on my end.