Skip to content

Change workload controller to use patch instead of update#286

Merged
kruise-bot merged 6 commits intoopenkruise:masterfrom
PersistentJZH:zhihao/feat/change-workload-controller-to-use-patch-instead-of-update
Jul 22, 2025
Merged

Change workload controller to use patch instead of update#286
kruise-bot merged 6 commits intoopenkruise:masterfrom
PersistentJZH:zhihao/feat/change-workload-controller-to-use-patch-instead-of-update

Conversation

@PersistentJZH
Copy link
Copy Markdown
Contributor

Ⅰ. Describe what this PR does

change workload controller to use patch instead of update

Ⅱ. Does this pull request fix one issue?

#273

Ⅲ. Special notes for reviews

@kruise-bot kruise-bot requested review from FillZpp and furykerry July 2, 2025 07:02
@PersistentJZH PersistentJZH force-pushed the zhihao/feat/change-workload-controller-to-use-patch-instead-of-update branch from ec2f55d to a05becd Compare July 2, 2025 09:07
@PersistentJZH PersistentJZH force-pushed the zhihao/feat/change-workload-controller-to-use-patch-instead-of-update branch from a05becd to 41ab580 Compare July 4, 2025 04:05
@kruise-bot kruise-bot added size/M and removed size/L labels Jul 4, 2025
@PersistentJZH PersistentJZH force-pushed the zhihao/feat/change-workload-controller-to-use-patch-instead-of-update branch from 41ab580 to c908c62 Compare July 4, 2025 07:07
@kruise-bot kruise-bot added size/L and removed size/M labels Jul 4, 2025
@PersistentJZH PersistentJZH force-pushed the zhihao/feat/change-workload-controller-to-use-patch-instead-of-update branch from c908c62 to ab5b7be Compare July 4, 2025 07:16
@furykerry
Copy link
Copy Markdown
Member

@PersistentJZH can you attend the community meeting this Thursday ?

@PersistentJZH
Copy link
Copy Markdown
Contributor Author

@PersistentJZH can you attend the community meeting this Thursday ?

sure, will attend meeting.

Signed-off-by: zhihao jian <zhihao.jian@shopee.com>

fix test

use MergeFrom func to patch data

get latest rs

fix unit test

fix lint

do not get latest status before patch
@PersistentJZH PersistentJZH force-pushed the zhihao/feat/change-workload-controller-to-use-patch-instead-of-update branch from ed7e44b to 51cebc1 Compare July 14, 2025 09:00
@codecov
Copy link
Copy Markdown

codecov bot commented Jul 14, 2025

Codecov Report

Attention: Patch coverage is 47.36842% with 40 lines in your changes missing coverage. Please review.

Project coverage is 45.68%. Comparing base (6554cfc) to head (c3e3d50).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
pkg/controller/deployment/sync.go 48.33% 23 Missing and 8 partials ⚠️
pkg/controller/deployment/progress.go 50.00% 2 Missing and 2 partials ⚠️
pkg/controller/deployment/deployment_controller.go 50.00% 3 Missing ⚠️
pkg/controller/deployment/controller.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #286      +/-   ##
==========================================
+ Coverage   45.22%   45.68%   +0.45%     
==========================================
  Files          61       61              
  Lines        7073     7083      +10     
==========================================
+ Hits         3199     3236      +37     
+ Misses       3324     3291      -33     
- Partials      550      556       +6     
Flag Coverage Δ
unittests 45.68% <47.36%> (+0.45%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

rsCopy.Spec.MinReadySeconds = d.Spec.MinReadySeconds
return dc.client.AppsV1().ReplicaSets(rsCopy.ObjectMeta.Namespace).Update(ctx, rsCopy, metav1.UpdateOptions{})
// Use existing state directly for patching, let API Server handle conflicts
rsCopy = existingNewRS.DeepCopy()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we deepcopy and change the replicaset again ? it seems that L114-L119 already did these things

// Set the annotations that need to be updated
desiredReplicas := *(deployment.Spec.Replicas)
maxReplicas := *(deployment.Spec.Replicas) + deploymentutil.MaxSurge(deployment, &dc.strategy)
deploymentutil.SetReplicasAnnotations(rsCopy, desiredReplicas, maxReplicas)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to keep the code of L414-L416 as much as possible?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I have made corresponding adjustments.

@PersistentJZH PersistentJZH force-pushed the zhihao/feat/change-workload-controller-to-use-patch-instead-of-update branch from 6902031 to ef47137 Compare July 14, 2025 15:07
@PersistentJZH
Copy link
Copy Markdown
Contributor Author

thanks @furykerry, according to the weekly meeting discussion, this PR made two adjustments:

  1. change workload controller to use patch instead of update.
  2. we can do it more thoroughly: use runtimeClient to unify the operation of deployment, which can make unit test more elegant and enhance the maintainability and scalability of the code.


// List all ReplicaSets using runtimeClient
rsList := &apps.ReplicaSetList{}
err = dc.runtimeClient.List(ctx, rsList, client.InNamespace(d.Namespace), client.MatchingLabelsSelector{Selector: deploymentSelector})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to list with UnsafeDisableDeepCopy option

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will use UnsafeDisableDeepCopy to optimize performance.

rolloutsv1alpha1.DeploymentExtraStatusAnnotation,
strings.Replace(extraStatusAnno, `"`, `\"`, -1))
// Create a patch for the annotation
patch := client.MergeFrom(deployment.DeepCopy())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it necessary to deepcopy the deployment here ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the deepCopy here is necessary.
Reasons:

  1. client.MergeFrom() requires a copy of the original state to calculate differences
  2. the MergeFrom() implementation doesn't perform DeepCopy - it just stores a reference to the provided object

Copy link
Copy Markdown
Member

@furykerry furykerry Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra option (OptimisticLock) is required to enable patch that contains the resource version sigs.k8s.io/controller-runtimepkg/client/patch.go, and when OptimisticLock is set, deepcopy is used automatically during the Data() func of mergeFromPatch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it seems that all patch operations need to turn on OptimisticLock right?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, at least for the patch operation that is supposed to replace the original update operation

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @furykerry, here are two points that need your attention:

  1. client.MergeFromWithOptions() with MergeFromWithOptimisticLock also requires deepcopy. For details, see the implementation of mergeFromPatch sig.k8s.io/controller-runtime impl. Because deployment is a pointer type, the two objects here are the same, which will cause the patch to not take effect. I verified this in the local kind environment.

  2. need to get the latest deployment here, because the previous step (probably here) always updates the condition lastTransitionTime and lastUpdateTime field of the deployment in one reconcile round, causing the resourceVersion to change, which will cause the patch extra status to fail (it will only fail in the last step of canary).

image

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to get the latest deployment here, because the previous step (probably here) always updates the condition lastTransitionTime and lastUpdateTime field of the deployment in one reconcile round, causing the resourceVersion to change, which will cause the patch extra status to fail (it will only fail in the last step of canary).

thanks for your test, can you comment in the code why the latest deployment should be fetched? it is rather counter-intuitive.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

@PersistentJZH PersistentJZH force-pushed the zhihao/feat/change-workload-controller-to-use-patch-instead-of-update branch 7 times, most recently from 6b51ecc to c46aa99 Compare July 17, 2025 08:31
Signed-off-by: zhihao jian <zhihao.jian@shopee.com>

remove dupl SetNewReplicaSetAnnotations

use UnsafeDisableDeepCopy to optimize performance

use optimisticLock for patch

fix patch extra status always failed

fix unit test
@PersistentJZH PersistentJZH force-pushed the zhihao/feat/change-workload-controller-to-use-patch-instead-of-update branch from c46aa99 to 20bc6e9 Compare July 17, 2025 08:45
Signed-off-by: zhihao jian <zhihao.jian@shopee.com>

remove dupl SetNewReplicaSetAnnotations

use UnsafeDisableDeepCopy to optimize performance

use optimisticLock for patch

fix patch extra status always failed

fix unit test

add comment
@PersistentJZH PersistentJZH force-pushed the zhihao/feat/change-workload-controller-to-use-patch-instead-of-update branch from 20bc6e9 to 884be19 Compare July 17, 2025 15:05
Copy link
Copy Markdown
Member

@furykerry furykerry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

// The deployment passed in here has an old resourceVersion, so we need to fetch the latest deployment
// to ensure patch success.
latestDeployment := &apps.Deployment{}
err := dc.runtimeClient.Get(context.TODO(), client.ObjectKeyFromObject(deployment), latestDeployment)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the runtimeClient here may read the cached deployment not the latest deployment. a better solution is to comine the patchExtraStatus func with syncDeploymentStatus, so that only one patch is required. Actually when update status of deployment, annotation can also be updated .

zhihao jian added 2 commits July 18, 2025 17:02
Signed-off-by: zhihao jian <zhihao.jian@shopee.com>
…tead-of-update' of github.com:PersistentJZH/rollouts into zhihao/feat/change-workload-controller-to-use-patch-instead-of-update

# Conflicts:
#	pkg/controller/deployment/deployment_controller.go
#	pkg/controller/deployment/progress.go
#	pkg/controller/deployment/sync.go
@zmberg
Copy link
Copy Markdown
Member

zmberg commented Jul 18, 2025

@PersistentJZH Can dingding qun communicate?

@zmberg
Copy link
Copy Markdown
Member

zmberg commented Jul 22, 2025

/lgtm

@kruise-bot kruise-bot removed the lgtm label Jul 22, 2025
Copy link
Copy Markdown
Member

@furykerry furykerry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@kruise-bot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: furykerry

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kruise-bot kruise-bot merged commit 0fbc3ed into openkruise:master Jul 22, 2025
32 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants