test: Improved Cluster State Test Suite duration from 727 seconds to 94 seconds #2059
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #N/A
Description
Recently, the cluster state tests have been taking an increasing amount of time (see below for a run which took 727 seconds). I generated a test report with Ginkgo (see below for commands) that identified the slowest tests in the suite and found that
Cluster State Sync
tests were running the longest.Investigating the
ExpectApplied
function showed that running sequentially takes a long time because each function call makes 5 calls to the apiserver:Running these requests in parallel with a WaitGroup dramatically increases the speed of the tests. It initially encountered issues with client-side throttling, which is not constrained by API Priority & Fairness. To get around this, I also added a helper function that allows custom
rest.Config
settings when constructing the environment and then set the client to not be rate-limited.One thing that may need to change is that in the previous test, we were checking
ExpectMetricGaugeValue(state.ClusterStateNodesCount, float64(i+1), nil)
after every create whereas now this metric is checked against the expected count after creating all objects. A better way to track this could be another goroutine which polls this metric and validates that it is increasing until the test reaches the expected end result. Happy to add in this PR or in a subsequent one.How was this change tested?
Before
After
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.