Skip to content

refactor: fix unimplemented concurrency in CheckComponentStatus#5522

Open
yoonseo-han wants to merge 8 commits into
litmuschaos:masterfrom
yoonseo-han:fix/parallel-deployment-status-checks
Open

refactor: fix unimplemented concurrency in CheckComponentStatus#5522
yoonseo-han wants to merge 8 commits into
litmuschaos:masterfrom
yoonseo-han:fix/parallel-deployment-status-checks

Conversation

@yoonseo-han

Copy link
Copy Markdown

Proposed changes

CheckComponentStatus in pkg/k8s/operations.go claimed to check infra components concurrently but launched a single goroutine and immediately blocked on WaitGroup.Wait() making it fully synchronous.

The sync.Mutex guarding LiveStatus had no concurrent writer to protect against, making it dead complexity.
The comment // add all agent components to waitgroup directly contradicted the wait.Add(1) below it.

This PR replaces the misleading pattern with correct goroutine usage and seperation of concerns by splitting functions for easiser testing:

  • deploymentHealthy: pure function, single selector, single API call
  • allDeploymentsHealthy: spawns one goroutine per selector, uses atomic.Int32 for race-safe failure counting, sync.WaitGroup as barrier
  • CheckComponentStatus: retry orchestration logic executed only; dead state (LiveStatus, AccessLiveStatus) removed and replaced by return values

Types of changes

What types of changes does your code introduce to Litmus? Put an x in the boxes that apply

  • New feature (non-breaking change which adds functionality)
  • Bugfix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices applies)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have read the CONTRIBUTING doc
  • I have signed the commit for DCO to be passed.
  • Lint and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added necessary documentation (if appropriate)

Dependency

NA

Special notes for your reviewer:

The full CheckComponentStatus retry loop is intentionally left untested as implementing would require clock injection or 150s+ real sleeps; coverage sits at the two pure helper layers instead.

kubernetes.Interface is used in place of *kubernetes.Clientset to allow fake client injection in tests. No production impact.

Signed-off-by: Yoonseo Han <yooncer00@gmail.com>
Signed-off-by: Yoonseo Han <yooncer00@gmail.com>
…ests

Signed-off-by: Yoonseo Han <yooncer00@gmail.com>
…round

Signed-off-by: Yoonseo Han <yooncer00@gmail.com>
…adability

Signed-off-by: Yoonseo Han <yooncer00@gmail.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors CheckComponentStatus in the ChaosCenter subscriber to actually perform infra component health checks concurrently, simplifying the previous (effectively synchronous) WaitGroup pattern and making the core checks more testable.

Changes:

  • Replaces the old single-goroutine + immediate Wait() pattern with per-selector goroutines in allDeploymentsHealthy.
  • Introduces a pure helper deploymentHealthy to evaluate readiness for a single label selector.
  • Adds unit tests for deploymentHealthy and allDeploymentsHealthy, and updates existing tests to use renamed fake client imports.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
chaoscenter/subscriber/pkg/k8s/operations.go Refactors component status checking into concurrent, testable helper functions and simplifies orchestration logic.
chaoscenter/subscriber/pkg/k8s/operations_test.go Adds unit tests covering the new helper functions and updates fake client imports.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +64 to +72
for retry := range LiveCheckMaxTries {
if allDeploymentsHealthy(ctx, clientset, InfraNamespace, components.Deployments) {
logrus.Info("All infra deployments are up")
return nil
}
if retry < LiveCheckMaxTries-1 {
time.Sleep(30 * time.Second)
}
}
healthy: false,
},
{
name: "Selector doesnt match any pod",
@PriteshKiri

Copy link
Copy Markdown
Contributor

hey @yoonseo-han
Could you please check the review comments from Co-Pilot?

@PriteshKiri

Copy link
Copy Markdown
Contributor

@yoonseo-han Trivy check is failing for this PR. Could you please raise a new PR for this and link it here?

@yoonseo-han

Copy link
Copy Markdown
Author

Hey @PriteshKiri

Thanks for the check. Had a look at the copilot comments and changed the codes respectively. As for the CI, it seems ok based on the rerun triggerred thanks to you.

Ready for review when you feel free. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants