Skip to content

Fix doNotDeploy not removing off-schedule BundleDeployments#5071

Open
dockerized-nl wants to merge 3 commits into
rancher:mainfrom
dockerized-nl:fix/donotdeploy-offschedule-cleanup
Open

Fix doNotDeploy not removing off-schedule BundleDeployments#5071
dockerized-nl wants to merge 3 commits into
rancher:mainfrom
dockerized-nl:fix/donotdeploy-offschedule-cleanup

Conversation

@dockerized-nl
Copy link
Copy Markdown
Contributor

Summary

  • cleanupOrphanedBundleDeployments had a guard || bd.Spec.OffSchedule that preserved any BD with OffSchedule=true, even when that BD was no longer targeted (e.g. because doNotDeploy: true was set in fleet.yaml after the app was already deployed).
  • This left the app installed on the downstream cluster indefinitely until the cluster re-entered its maintenance window — which could be days or weeks away.
  • The fix removes the OffSchedule guard. The uidsToKeep set already correctly identifies all actively-targeted BDs; the extra guard was redundant for those and harmful for untargeted ones.

Root Cause

In cleanupOrphanedBundleDeployments (bundle_controller.go):

// Before — OffSchedule guard prevents deletion of untargeted BDs
return uidsToKeep.Has(bd.UID) || bd.Spec.OffSchedule

// After — only uidsToKeep determines what is kept
return uidsToKeep.Has(bd.UID)

slices.DeleteFunc removes elements where the predicate returns true, so any BD with OffSchedule=true was never added to the delete list — even when its cluster was no longer targeted at all.

Behaviour

Scenario Before After
doNotDeploy set, cluster in active window BD deleted ✓ BD deleted ✓
doNotDeploy set, cluster in maintenance gap (OffSchedule=true) BD not deleted ✗ BD deleted ✓
doNotDeploy not set, cluster in maintenance gap BD kept ✓ BD kept ✓

Test plan

  • Added integration test When("setting doNotDeploy after deployment when the existing BundleDeployment is off-schedule") that explicitly sets OffSchedule=true on the BD before enabling doNotDeploy, then asserts the BD is cleaned up.
  • Existing doNotDeploy integration tests continue to pass.)

cleanupOrphanedBundleDeployments preserved any BD with Spec.OffSchedule=true,
even when that BD was no longer targeted (e.g. because doNotDeploy was set in
fleet.yaml after the app was already deployed). This left the app installed on
the downstream cluster indefinitely until the cluster re-entered its maintenance
window.

The OffSchedule flag is intended to prevent updates/deployments during a
maintenance-window gap, not to block removal of BDs that are entirely untargeted.
BDs that are still actively targeted are already preserved by the uidsToKeep
check; the additional `|| bd.Spec.OffSchedule` guard was therefore both
redundant for targeted BDs and harmful for untargeted ones.

Remove the OffSchedule guard from cleanupOrphanedBundleDeployments so that a BD
is always deleted when its cluster is no longer targeted, regardless of schedule
state. Add an integration test that explicitly sets OffSchedule=true on the BD
before enabling doNotDeploy, confirming the BD is cleaned up.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dockerized-nl dockerized-nl requested a review from a team as a code owner May 5, 2026 07:06
@kkaempf
Copy link
Copy Markdown
Collaborator

kkaempf commented May 5, 2026

thanks @dockerized-nl, esp. for the extensive comments within the code 💚

@kkaempf kkaempf added this to Fleet May 5, 2026
@kkaempf kkaempf moved this to 👀 In review in Fleet May 5, 2026
@kkaempf kkaempf added this to the v2.14.2 milestone May 5, 2026
@weyfonk weyfonk modified the milestones: v2.14.2, v2.14.3 May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: 👀 In review

Development

Successfully merging this pull request may close these issues.

3 participants