Skip to content

Conversation

@haoqing0110
Copy link
Member

@haoqing0110 haoqing0110 commented Jan 20, 2026

…ates

When PlacementDecisions are updated sequentially (e.g., decision1 then decision2), the addon controller may see intermediate states where a cluster temporarily appears in neither decision, causing unnecessary addon deletion and recreation.

This change adds 100ms delay to all informer events to batch rapid sequential updates before reconciliation.

Changes:

  • Add 100ms delay to all 4 informers in addon-management-controller
  • Use sdk-go delay support via WithInformersQueueKeysFuncAndDelay()
  • Add integration test to verify addon UID remains unchanged during updates

Technical details:

  • WorkQueue's AddAfter() automatically deduplicates events with same key
  • If multiple PlacementDecision updates occur within 100ms, only one reconcile happens
  • Delete events are still processed immediately for timely cleanup

🤖 Generated with Claude Code

Summary

Related issue(s)

Fixes #

Summary by CodeRabbit

  • Performance

    • Optimized event batching in addon management to reduce processing overhead through delayed queue handling.
  • Tests

    • Enhanced testing to verify addon stability and prevent unnecessary recreation during sequential placement updates.

✏️ Tip: You can customize this high-level summary in your review settings.

…ates

When PlacementDecisions are updated sequentially (e.g., decision1 then decision2),
the addon controller may see intermediate states where a cluster temporarily
appears in neither decision, causing unnecessary addon deletion and recreation.

This change adds 100ms delay to all informer events to batch rapid sequential
updates before reconciliation.

Changes:
- Add 100ms delay to all 4 informers in addon-management-controller
- Use sdk-go delay support via WithInformersQueueKeysFuncAndDelay()
- Add integration test to verify addon UID remains unchanged during updates

Technical details:
- WorkQueue's AddAfter() automatically deduplicates events with same key
- If multiple PlacementDecision updates occur within 100ms, only one reconcile happens
- Delete events are still processed immediately for timely cleanup

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Signed-off-by: Qing Hao <[email protected]>
@openshift-ci openshift-ci bot requested review from skeeey and zhujian7 January 20, 2026 10:05
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 20, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: haoqing0110
Once this PR has been reviewed and has the lgtm label, please assign jnpacker for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@haoqing0110
Copy link
Member Author

/hold

@coderabbitai
Copy link

coderabbitai bot commented Jan 20, 2026

Walkthrough

The pull request updates the addon management controller to use delayed event batching with a 100ms delay, adds a module replacement in go.mod for a forked sdk-go dependency, and introduces an integration test validating addon stability during sequential PlacementDecision updates.

Changes

Cohort / File(s) Summary
Module Dependency
go.mod
Adds module replacement directing open-cluster-management.io/sdk-go to resolve to forked version at github.com/haoqing0110/sdk-go v0.0.0-20260120095521-fe81e417c1e1
Controller Event Queuing
pkg/addon/controllers/addonmanagement/controller.go
Replaces three WithInformersQueueKeysFunc calls with WithInformersQueueKeysFuncAndDelay, introducing 100ms delay for event batching; adds time import
Integration Tests
test/integration/addon/addon_manager_install_test.go
Refactors test setup to regenerate random suffix per test case; adds new test "Should not recreate addon during sequential PlacementDecision updates" verifying addon UID stability during cluster movement between PlacementDecisions

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

lgtm, approved

Suggested reviewers

  • qiujian16
  • zhiweiyin318
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly describes the main bug fix: preventing addon recreation during sequential PlacementDecision updates, which aligns with the primary change in the changeset.
Description check ✅ Passed The PR description provides a clear summary of the issue, technical solution, and changes made. However, the template sections (Summary and Related issue) are incomplete or missing content.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@go.mod`:
- Around line 190-191: The replace directive pointing to your personal fork (the
`replace open-cluster-management.io/sdk-go => github.com/haoqing0110/sdk-go
v0.0.0-20260120095521-fe81e417c1e1` entry) must be removed before merging;
either wait for the `WithInformersQueueKeysFuncAndDelay` feature to be merged
upstream and update the dependency to the official
`open-cluster-management.io/sdk-go` release, or temporarily vendor the patch in
a feature branch, then delete this `replace` line and restore the upstream
module reference in go.mod (and run `go mod tidy`/`go mod vendor` as
appropriate) so the main branch has no personal-fork replace directive.
🧹 Nitpick comments (3)
pkg/addon/controllers/addonmanagement/controller.go (1)

67-83: LGTM! The delayed event batching approach is sound.

The 100ms delay using WithInformersQueueKeysFuncAndDelay will allow the work queue to deduplicate events with the same key, preventing unnecessary addon recreation during rapid sequential updates.

Consider extracting the delay value to a named constant for clarity and maintainability:

const informerEventBatchingDelay = 100 * time.Millisecond
♻️ Optional: Extract delay constant
+const informerEventBatchingDelay = 100 * time.Millisecond
+
 func NewAddonManagementController(
 	...
 ) factory.Controller {
 	...
 	return factory.New().
 		WithInformersQueueKeysFuncAndDelay(
 			queue.QueueKeyByMetaName,
-			100*time.Millisecond,
+			informerEventBatchingDelay,
 			addonInformers.Informer(), clusterManagementAddonInformers.Informer()).
 		WithInformersQueueKeysFuncAndDelay(
 			addonindex.ClusterManagementAddonByPlacementDecisionQueueKey(
 				clusterManagementAddonInformers),
-			100*time.Millisecond,
+			informerEventBatchingDelay,
 			placementDecisionInformer.Informer()).
 		WithInformersQueueKeysFuncAndDelay(
 			addonindex.ClusterManagementAddonByPlacementQueueKey(
 				clusterManagementAddonInformers),
-			100*time.Millisecond,
+			informerEventBatchingDelay,
 			placementInformer.Informer()).
 		WithSync(c.sync).ToController("addon-management-controller")
 }
test/integration/addon/addon_manager_install_test.go (2)

36-37: Minor: Double random suffix may be redundant.

The namespace already includes one random suffix. Adding a second rand.String(5) provides extra collision protection but may be unnecessary given that suffix is already randomized per test run.

This is a non-blocking observation - the current approach works correctly.


286-295: Consider extending Consistently duration for CI stability.

The 5-second duration with 500ms intervals provides 10 verification checks. While this should be sufficient under normal conditions, CI environments under load might process events more slowly.

If you observe flakiness in CI, consider extending to "10s" for additional safety margin.

Comment on lines +190 to +191

replace open-cluster-management.io/sdk-go => github.com/haoqing0110/sdk-go v0.0.0-20260120095521-fe81e417c1e1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Personal fork dependency should not be merged to main.

Using a replace directive to point to a personal fork (github.com/haoqing0110/sdk-go) introduces supply chain risks and maintenance concerns. This pattern is acceptable for local development or feature branches, but should be resolved before merging to main.

Consider:

  1. Getting the WithInformersQueueKeysFuncAndDelay feature merged upstream to open-cluster-management.io/sdk-go
  2. Once merged, update the dependency to the official release and remove this replace directive
🤖 Prompt for AI Agents
In `@go.mod` around lines 190 - 191, The replace directive pointing to your
personal fork (the `replace open-cluster-management.io/sdk-go =>
github.com/haoqing0110/sdk-go v0.0.0-20260120095521-fe81e417c1e1` entry) must be
removed before merging; either wait for the `WithInformersQueueKeysFuncAndDelay`
feature to be merged upstream and update the dependency to the official
`open-cluster-management.io/sdk-go` release, or temporarily vendor the patch in
a feature branch, then delete this `replace` line and restore the upstream
module reference in go.mod (and run `go mod tidy`/`go mod vendor` as
appropriate) so the main branch has no personal-fork replace directive.

@codecov
Copy link

codecov bot commented Jan 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 62.10%. Comparing base (5740147) to head (4eb7d39).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1344      +/-   ##
==========================================
+ Coverage   62.03%   62.10%   +0.07%     
==========================================
  Files         218      218              
  Lines       17414    17418       +4     
==========================================
+ Hits        10802    10818      +16     
+ Misses       5466     5457       -9     
+ Partials     1146     1143       -3     
Flag Coverage Δ
unit 62.10% <100.00%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant