Skip to content

Keep Deployment paused on Rollout deletion#281

Merged
kruise-bot merged 1 commit intoopenkruise:masterfrom
GautamBytes:pause-on-delete
Sep 28, 2025
Merged

Keep Deployment paused on Rollout deletion#281
kruise-bot merged 1 commit intoopenkruise:masterfrom
GautamBytes:pause-on-delete

Conversation

@GautamBytes
Copy link
Copy Markdown
Contributor

Ⅰ. Describe what this PR does

This PR addresses an issue where deleting a Rollout CR causes the controller to unpause the underlying Deployment, triggering an unexpected rolling update.

To solve this, a new feature has been added to keep the Deployment paused when its managing Rollout is deleted. This behavior is controlled by a new feature gate, KeepDeploymentPausedOnDeletionGate. When the gate is enabled, the rollout finalizer will skip the step that unpauses the Deployment, leaving it in its static, paused state. This change has been applied to both Canary and Blue-Green strategies for consistent behavior.

Ⅱ. Does this pull request fix one issue?

Fixes #272

Ⅲ. Special notes for reviews

  • The key aspect of this change is the introduction of the KeepDeploymentPausedOnDeletionGate feature gate.

  • The new behavior is disabled by default to ensure full backward compatibility.
    Users must explicitly enable the feature gate to prevent the Deployment from being unpaused upon Rollout deletion. This makes the feature entirely opt-in.

@kruise-bot kruise-bot requested review from veophi and zmberg June 21, 2025 07:19
@kruise-bot
Copy link
Copy Markdown

Welcome @GautamBytes! It looks like this is your first PR to openkruise/rollouts 🎉

@GautamBytes
Copy link
Copy Markdown
Contributor Author

@furykerry , Can you help me in identifying why Unit-Test CL workflow is failing even though i ran 'make generate manifests' and pushed the auto-generated files as mentioned in workflow file

if c.FinalizeReason == v1beta1.FinaliseReasonDelete && utilfeature.DefaultMutableFeatureGate.Enabled(feature.KeepDeploymentPausedOnDeletionGate) {
klog.Infof("Rollout(%s/%s) is being deleted, KeepDeploymentPausedOnDeletion is enabled. Skipping resume workload step.", c.Rollout.Namespace, c.Rollout.Name)
// Do nothing, effectively skipping this step and considering it "done".
} else {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot simply change deployment here, instead, we should change the Finalize method for the following packages:

  1. pkg/controller/batchrelease/control/partitionstyle
  2. pkg/controller/batchrelease/control/canarystyle
  3. pkg/controller/batchrelease/control/bluegreenstyle

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay , will do it . Thanks for the help!

@AiRanthem
Copy link
Copy Markdown
Member

AiRanthem commented Aug 1, 2025

@GautamBytes Thank you for your contribution 🥂.

I believe the behavior of Rollout when handling deletion/disable should be as follows:

  1. The Rollout controller patches the batchPartition field of the corresponding BatchRelease object to null
  2. The BatchRelease controller executes the Finalize method for the corresponding release type (which is the part you modified)
  3. Execute the Finalize method of the corresponding workload to restore it to its original state
  4. After successfully restoring the workload, the BatchRelease enters the Completed state
  5. The Rollout controller deletes the BatchRelease resource

Therefore, this PR may have the following issues:

  1. It completely skips the workload restoration. The logic should be moved down, only keeping the deployment's pause=true, and the logic of removing annotations, labels, and changing the strategy back to RollingUpdate, which is important, still needs to be executed
  2. Have you considered implementing the protection logic for other workloads together:
workload partition canary blue-green
Deployment retain pause=true skip deleting canary retain pause=true and maxSurge
StatefulSet retain partition / /
CloneSet retain pause=true and partition / retain pause=true and maxSurge
DaemonSet retain pause=true and partition / /

@furykerry @zmberg What do you think of the modification plan in the table?

@furykerry
Copy link
Copy Markdown
Member

we can discuss it in the next community call. For canary and blue-green deployment, if we keep the workload as it is, it is quite hard for the user to recover the workload and traffic.

@AiRanthem
Copy link
Copy Markdown
Member

we can discuss it in the next community call. For canary and blue-green deployment, if we keep the workload as it is, it is quite hard for the user to recover the workload and traffic.

@furykerry may be a simple apply is all you need to recover

@furykerry
Copy link
Copy Markdown
Member

furykerry commented Aug 7, 2025

@GautamBytes in the community call, we have reach consensus that if the KeepDeploymentPausedOnDeletionGate is enabled, we should still do the finalize works such as restoring traffic routing and deleting canary deployment, the only different is we should keep the workload "paused", that is to say , keep deployment spec.paused=true and statefulset like workload spec.updateStrategy.rollingUpdate.partition as is. So we should not skip the Finalize func all together, just disable some field patching in the Finalize func in these files:

  • pkg/controller/batchrelease/control/partitionstyle/statefulset/control.go
  • pkg/controller/batchrelease/control/partitionstyle/deployment/control.go
  • pkg/controller/batchrelease/control/partitionstyle/daemonset/control.go
  • pkg/controller/batchrelease/control/partitionstyle/cloneset/control.go
  • pkg/controller/batchrelease/control/canarystyle/deployment/stable.go
  • pkg/controller/batchrelease/control/bluegreenstyle/cloneset/control.go
  • pkg/controller/batchrelease/control/bluegreenstyle/deployment/control.go

the only tricky part is in the Finalize method in pkg/controller/batchrelease/control/canarystyle/deployment/stable.go, if KeepDeploymentPausedOnDeletionGate is enabled, we shall not wait for all pods updated waitAllUpdatedAndReady

@GautamBytes
Copy link
Copy Markdown
Contributor Author

@furykerry sure , will update it accordingly . Currently going through exams will update it soon after 10 aug. Thanks!

@codecov
Copy link
Copy Markdown

codecov bot commented Aug 26, 2025

Codecov Report

❌ Patch coverage is 60.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 47.50%. Comparing base (00aae12) to head (0de8df0).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
...release/control/partitionstyle/cloneset/control.go 50.00% 1 Missing and 2 partials ⚠️
...elease/control/partitionstyle/daemonset/control.go 40.00% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #281      +/-   ##
==========================================
- Coverage   47.52%   47.50%   -0.03%     
==========================================
  Files          63       63              
  Lines        9799     9806       +7     
==========================================
+ Hits         4657     4658       +1     
- Misses       4585     4587       +2     
- Partials      557      561       +4     
Flag Coverage Δ
unittests 47.50% <60.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@GautamBytes
Copy link
Copy Markdown
Contributor Author

@GautamBytes in the community call, we have reach consensus that if the KeepDeploymentPausedOnDeletionGate is enabled, we should still do the finalize works such as restoring traffic routing and deleting canary deployment, the only different is we should keep the workload "paused", that is to say , keep deployment spec.paused=true and statefulset like workload spec.updateStrategy.rollingUpdate.partition as is. So we should not skip the Finalize func all together, just disable some field patching in the Finalize func in these files:

  • pkg/controller/batchrelease/control/partitionstyle/statefulset/control.go
  • pkg/controller/batchrelease/control/partitionstyle/deployment/control.go
  • pkg/controller/batchrelease/control/partitionstyle/daemonset/control.go
  • pkg/controller/batchrelease/control/partitionstyle/cloneset/control.go
  • pkg/controller/batchrelease/control/canarystyle/deployment/stable.go
  • pkg/controller/batchrelease/control/bluegreenstyle/cloneset/control.go
  • pkg/controller/batchrelease/control/bluegreenstyle/deployment/control.go

the only tricky part is in the Finalize method in pkg/controller/batchrelease/control/canarystyle/deployment/stable.go, if KeepDeploymentPausedOnDeletionGate is enabled, we shall not wait for all pods updated waitAllUpdatedAndReady

@furykerry Did the suggested changes , its ready for review now!

@zmberg
Copy link
Copy Markdown
Member

zmberg commented Sep 24, 2025

@GautamBytes This PR only addresses scenarios involving partitionStyle; other aspects haven't been fully considered yet.

@furykerry
Copy link
Copy Markdown
Member

@GautamBytes KeepDeploymentPausedOnDeletionGate only make sense for partitionStyle strategy currently, consider remove the changes for canary and bluegreen strategy

Signed-off-by: GautamBytes <manchandanigautam@gmail.com>
@furykerry
Copy link
Copy Markdown
Member

/lgtm

@zmberg
Copy link
Copy Markdown
Member

zmberg commented Sep 28, 2025

/approve

@kruise-bot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zmberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kruise-bot kruise-bot merged commit e160b31 into openkruise:master Sep 28, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

keep pause=true when rollout CR is deleted

5 participants