-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Improved Cluster State Test Suite duration from 727 seconds to 71 seconds #2060
test: Improved Cluster State Test Suite duration from 727 seconds to 71 seconds #2060
Conversation
Skipping CI for Draft Pull Request. |
Pull Request Test Coverage Report for Build 13727202524Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jonathan-innis, rschalo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fixes #N/A
Description
Recently, the cluster state tests have been taking an increasing amount of time (see below for a run which took 727 seconds). I generated a test report with Ginkgo (see below for commands) that identified the slowest tests in the suite and found that
Cluster State Sync
tests were running the longest.Investigating the
ExpectApplied
function showed that running sequentially takes a long time because each function call makes 5 calls to the apiserver:Running these requests in parallel with a WaitGroup dramatically increases the speed of the tests. It initially encountered issues with client-side throttling, which is not constrained by API Priority & Fairness. To get around this, I also added a helper function that allows custom
rest.Config
settings when constructing the environment and then set the client to not be rate-limited.One thing that may need to change is that in the previous test, we were checking
ExpectMetricGaugeValue(state.ClusterStateNodesCount, float64(i+1), nil)
after every create whereas now this metric is checked against the expected count after creating all objects. A better way to track this could be another goroutine which polls this metric and validates that it is increasing until the test reaches the expected end result. Happy to add in this PR or in a subsequent one.How was this change tested?
When evaluating effectiveness, compare recent CI runs for 1.31.x and 1.32.x clusters compared to the CI tests in this PR. Additionally, see below for the local comparison.
Before
After
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.