Skip to content

[v0.15] Fix stale bundle error when cluster is offline after bad commit (#4780)#4823

Merged
thardeck merged 1 commit into
release/v0.15from
v0.15_implement_594
Mar 20, 2026
Merged

[v0.15] Fix stale bundle error when cluster is offline after bad commit (#4780)#4823
thardeck merged 1 commit into
release/v0.15from
v0.15_implement_594

Conversation

@thardeck
Copy link
Copy Markdown
Collaborator

@thardeck thardeck commented Mar 16, 2026

  • Fix stale bundle error when cluster is offline after bad commit

When a GitRepo contains a YAML parse error and the cluster agent is offline, the bundle's Ready condition retains the error message even after a fix commit is pushed. Three interdependent changes are needed.

deployer.go: Add MalformedYAMLError to the deployErrToStatus regex. Helm v4 changed the error format from "YAML parse error" to "MalformedYAMLError"; without this match the error is routed to the Deployed condition instead of Installed, bypassing the staleness guard.

summary.go: In MessageFromDeployment, skip the Installed condition message when AppliedDeploymentID differs from Spec.DeploymentID, so a stale error from a superseded apply attempt is not surfaced.

target.go: Add effectiveDeployment so state and message compare against t.DeploymentID (the ID the controller is about to write) rather than the stale Spec.DeploymentID still held in the cached BundleDeployment. The bundle controller calls SetReadyConditions before updating BD specs, so the summary.go guard would otherwise never trigger while the agent is offline.

  • Fix integration test after effectiveDeployment change

After labels change the controller uses effectiveDeployment to compute WaitApplied=1 (the new deployment ID hasn't been applied yet). The test was checking WaitApplied==0 without simulating the agent re-applying the updated bundle deployment. Split the assertion into three steps: wait for BD spec change, simulate agent applying the new deployment, then assert the bundle shows WaitApplied=0.

Backport of #4780
Refers to #594

* Fix stale bundle error when cluster is offline after bad commit

When a GitRepo contains a YAML parse error and the cluster agent is
offline, the bundle's Ready condition retains the error message even
after a fix commit is pushed. Three interdependent changes are needed.

deployer.go: Add MalformedYAMLError to the deployErrToStatus regex.
Helm v4 changed the error format from "YAML parse error" to
"MalformedYAMLError"; without this match the error is routed to the
Deployed condition instead of Installed, bypassing the staleness guard.

summary.go: In MessageFromDeployment, skip the Installed condition
message when AppliedDeploymentID differs from Spec.DeploymentID, so a
stale error from a superseded apply attempt is not surfaced.

target.go: Add effectiveDeployment so state and message compare against
t.DeploymentID (the ID the controller is about to write) rather than the
stale Spec.DeploymentID still held in the cached BundleDeployment. The
bundle controller calls SetReadyConditions before updating BD specs, so
the summary.go guard would otherwise never trigger while the agent is
offline.

* Fix integration test after effectiveDeployment change

After labels change the controller uses effectiveDeployment to compute
WaitApplied=1 (the new deployment ID hasn't been applied yet). The test
was checking WaitApplied==0 without simulating the agent re-applying the
updated bundle deployment. Split the assertion into three steps: wait for
BD spec change, simulate agent applying the new deployment, then assert
the bundle shows WaitApplied=0.
@thardeck thardeck self-assigned this Mar 16, 2026
@thardeck thardeck requested a review from a team as a code owner March 16, 2026 10:15
@thardeck thardeck added this to Fleet Mar 16, 2026
@thardeck thardeck moved this to 👀 In review in Fleet Mar 16, 2026
@thardeck thardeck merged commit 9b1d80b into release/v0.15 Mar 20, 2026
22 checks passed
@thardeck thardeck deleted the v0.15_implement_594 branch March 20, 2026 05:06
@github-project-automation github-project-automation Bot moved this from 👀 In review to ✅ Done in Fleet Mar 20, 2026
thardeck added a commit that referenced this pull request Apr 15, 2026
thardeck added a commit that referenced this pull request Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants