Limit cluster management job concurrency#149
Merged
danielfrankcom merged 1 commit intomainfrom May 20, 2025
Merged
Conversation
imforster
approved these changes
May 20, 2025
vic-tsang
pushed a commit
that referenced
this pull request
May 20, 2025
Co-authored-by: Daniel Frankcom <frankcom@amazon.com>
vic-tsang
added a commit
that referenced
this pull request
May 20, 2025
* make all env variables to be the same across all languages for cluster management * fixed env names for single cluster * fixed env variables * fixed go env name * fixed single cluster region env name * close single cluster client in java * Limit cluster management job concurrency (#149) Co-authored-by: Daniel Frankcom <frankcom@amazon.com> --------- Co-authored-by: Victor Tsang <vitsangp@amazon.com> Co-authored-by: Daniel Frankcom <daniel@frankcom.ca> Co-authored-by: Daniel Frankcom <frankcom@amazon.com>
This was referenced May 28, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR modifies the cluster management workflows to ensure only 1 job runs at any given time against each account. This will help prevent conflicts where the cluster cleanup job deletes clusters that are being used by a testing job.
I could have put the
concurrencylimit on the workflow, which would ensure all jobs and cleanup steps finish before any other workflow can begin. I chose to put it on thejobhere instead, so that other tasks like formatting checks can complete and provide quick feedback in the PR without having to potentially wait for other workflows to finish.I tested this change by running 2 of the same workflow, and we can see here the job is waiting before proceeding since there is already a job running. We can also see the formatting check was not blocked.
There is a small risk in the following scenario:
If this happens then we could run out of cluster space in the account, and fail some later jobs in the queue which would otherwise have succeeded. If that happens then the cluster cleanup jobs will eventually run once all of the test jobs fail and fix the state of the account, and we can rerun any jobs that should have passed after that. It seems unlikely this will happen anyway given the prerequisites.
By submitting this pull request, I confirm that my contribution is made under the terms of the MIT-0 license.