-
Notifications
You must be signed in to change notification settings - Fork 125
🐛 fix: prevent addon recreation during sequential PlacementDecision upd… #1344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ates When PlacementDecisions are updated sequentially (e.g., decision1 then decision2), the addon controller may see intermediate states where a cluster temporarily appears in neither decision, causing unnecessary addon deletion and recreation. This change adds 100ms delay to all informer events to batch rapid sequential updates before reconciliation. Changes: - Add 100ms delay to all 4 informers in addon-management-controller - Use sdk-go delay support via WithInformersQueueKeysFuncAndDelay() - Add integration test to verify addon UID remains unchanged during updates Technical details: - WorkQueue's AddAfter() automatically deduplicates events with same key - If multiple PlacementDecision updates occur within 100ms, only one reconcile happens - Delete events are still processed immediately for timely cleanup 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]> Signed-off-by: Qing Hao <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: haoqing0110 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/hold |
WalkthroughThe pull request updates the addon management controller to use delayed event batching with a 100ms delay, adds a module replacement in go.mod for a forked sdk-go dependency, and introduces an integration test validating addon stability during sequential PlacementDecision updates. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@go.mod`:
- Around line 190-191: The replace directive pointing to your personal fork (the
`replace open-cluster-management.io/sdk-go => github.com/haoqing0110/sdk-go
v0.0.0-20260120095521-fe81e417c1e1` entry) must be removed before merging;
either wait for the `WithInformersQueueKeysFuncAndDelay` feature to be merged
upstream and update the dependency to the official
`open-cluster-management.io/sdk-go` release, or temporarily vendor the patch in
a feature branch, then delete this `replace` line and restore the upstream
module reference in go.mod (and run `go mod tidy`/`go mod vendor` as
appropriate) so the main branch has no personal-fork replace directive.
🧹 Nitpick comments (3)
pkg/addon/controllers/addonmanagement/controller.go (1)
67-83: LGTM! The delayed event batching approach is sound.The 100ms delay using
WithInformersQueueKeysFuncAndDelaywill allow the work queue to deduplicate events with the same key, preventing unnecessary addon recreation during rapid sequential updates.Consider extracting the delay value to a named constant for clarity and maintainability:
const informerEventBatchingDelay = 100 * time.Millisecond♻️ Optional: Extract delay constant
+const informerEventBatchingDelay = 100 * time.Millisecond + func NewAddonManagementController( ... ) factory.Controller { ... return factory.New(). WithInformersQueueKeysFuncAndDelay( queue.QueueKeyByMetaName, - 100*time.Millisecond, + informerEventBatchingDelay, addonInformers.Informer(), clusterManagementAddonInformers.Informer()). WithInformersQueueKeysFuncAndDelay( addonindex.ClusterManagementAddonByPlacementDecisionQueueKey( clusterManagementAddonInformers), - 100*time.Millisecond, + informerEventBatchingDelay, placementDecisionInformer.Informer()). WithInformersQueueKeysFuncAndDelay( addonindex.ClusterManagementAddonByPlacementQueueKey( clusterManagementAddonInformers), - 100*time.Millisecond, + informerEventBatchingDelay, placementInformer.Informer()). WithSync(c.sync).ToController("addon-management-controller") }test/integration/addon/addon_manager_install_test.go (2)
36-37: Minor: Double random suffix may be redundant.The namespace already includes one random suffix. Adding a second
rand.String(5)provides extra collision protection but may be unnecessary given thatsuffixis already randomized per test run.This is a non-blocking observation - the current approach works correctly.
286-295: Consider extendingConsistentlyduration for CI stability.The 5-second duration with 500ms intervals provides 10 verification checks. While this should be sufficient under normal conditions, CI environments under load might process events more slowly.
If you observe flakiness in CI, consider extending to
"10s"for additional safety margin.
|
|
||
| replace open-cluster-management.io/sdk-go => github.com/haoqing0110/sdk-go v0.0.0-20260120095521-fe81e417c1e1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal fork dependency should not be merged to main.
Using a replace directive to point to a personal fork (github.com/haoqing0110/sdk-go) introduces supply chain risks and maintenance concerns. This pattern is acceptable for local development or feature branches, but should be resolved before merging to main.
Consider:
- Getting the
WithInformersQueueKeysFuncAndDelayfeature merged upstream toopen-cluster-management.io/sdk-go - Once merged, update the dependency to the official release and remove this replace directive
🤖 Prompt for AI Agents
In `@go.mod` around lines 190 - 191, The replace directive pointing to your
personal fork (the `replace open-cluster-management.io/sdk-go =>
github.com/haoqing0110/sdk-go v0.0.0-20260120095521-fe81e417c1e1` entry) must be
removed before merging; either wait for the `WithInformersQueueKeysFuncAndDelay`
feature to be merged upstream and update the dependency to the official
`open-cluster-management.io/sdk-go` release, or temporarily vendor the patch in
a feature branch, then delete this `replace` line and restore the upstream
module reference in go.mod (and run `go mod tidy`/`go mod vendor` as
appropriate) so the main branch has no personal-fork replace directive.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1344 +/- ##
==========================================
+ Coverage 62.03% 62.10% +0.07%
==========================================
Files 218 218
Lines 17414 17418 +4
==========================================
+ Hits 10802 10818 +16
+ Misses 5466 5457 -9
+ Partials 1146 1143 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…ates
When PlacementDecisions are updated sequentially (e.g., decision1 then decision2), the addon controller may see intermediate states where a cluster temporarily appears in neither decision, causing unnecessary addon deletion and recreation.
This change adds 100ms delay to all informer events to batch rapid sequential updates before reconciliation.
Changes:
Technical details:
🤖 Generated with Claude Code
Summary
Related issue(s)
Fixes #
Summary by CodeRabbit
Performance
Tests
✏️ Tip: You can customize this high-level summary in your review settings.