Skip to content

ClusterCondition::last_update_time is updated on no-ops, causing infinite reconciles (in the worst case) #1032

Closed
@nightkr

Description

@nightkr

Affected version

Yes. (Still an issue on trunk, introduced in #571, rolled out around SDP 23.4.)

Current and expected behavior

Reconciling a cluster where there nothing has changed should be a no-op.

ClusterCondition::last_update_time breaks this expectation since it is set unconditionally to whatever the current time is, rounded to the second (

if old_condition.status == new_condition.status {
ClusterCondition {
last_update_time: Some(now),
last_transition_time: old_condition.last_transition_time,
..new_condition
}
). This is registered as another object modification if the new reconcile is not within the same wall-second as the previous one. Depending on how long one reconcile takes, that can cause (up to) an infinite re-reconciliation loop while the object is trying to settle down (which is likely to be an indication that the cluster is struggling to begin with!).

Possible solution

  1. Drop last_update_time completely (for compat: either stub it out or make it equivalent to last_transition_time)
  2. Take the value from whenever the data source for the condition was updated, rather than the current wall time (if it makes sense/is possible for that condition)

Additional context

Discovered by @siegfriedweber, discussed at https://stackable-workspace.slack.com/archives/C02FZ581UCD/p1747230004370629

Environment

No response

Would you like to work on fixing this bug?

None

Activity

moved this from Proposed to In Refinement in Stackable End-to-End Coordinationon Jun 11, 2025
self-assigned this
on Jun 11, 2025
maltesander

maltesander commented on Jun 11, 2025

@maltesander
Member

The approach back then was to follow the OpenShift ClusterOperatorStatusCondition, see https://github.com/openshift/api/blob/b1bcdbc3/config/v1/types_cluster_operator.go#L101.

There, the last_updated_time does not even appear so i am not sure why it was introduced here, as this would always only be the last timestamp the operator reconciled, which does not provide much value.

Suggestion is using Solution 1 and just drop it.

moved this from In Refinement to In Progress in Stackable End-to-End Coordinationon Jun 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    `ClusterCondition::last_update_time` is updated on no-ops, causing infinite reconciles (in the worst case) · Issue #1032 · stackabletech/operator-rs