-
Notifications
You must be signed in to change notification settings - Fork 65
tests/osc-test-fbc-integration: optimize the execution of prowjobs #1608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
tests/osc-test-fbc-integration: optimize the execution of prowjobs #1608
Conversation
|
/lgtm |
Currently it run all the jobs in parallel in a single phase. However, some times we see bugs that affect OSC regardless the workload time or OCP version, driving all jobs to fail equally, hence it is a waste of resources. With this change we break the execution in two phases: 1) first run a sanity check job 2) On passing, run the remaining jobs in parallel The cavet of that approach is that now it will take up to 8 hours to finish all jobs. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Currently all remaining jobs are running in parallel after the sanity check, which is going to require 9 runs from the common azure profile and increase our odds of hitting the quota limit. So this is serializing the execution further, to consume less quotas each time. The new sequence of executions is: sanity check (1 quota) -> ocp 4.19 and 4.20 (3 quotas) -> ocp 4.18 (3 quotas) -> ocp 4.17 (3 quotas) The caveat again is the increased time to complete the pipeline, now up to 16 hours. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
e7689a0 to
3d6cbf2
Compare
littlejawa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks @wainersm !
ldoktor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, one thing to consider is to reorder the tests so they are ordered as executed. Also you might consider executing only part of the configurations and leave the thorough testing for weekly/monthly/prerelease testing (I mean run one with kata/peer/coco and the other ocp versions just one each)
Add a RUN_PROWJOBS array parameter to control which prowjobs run. This allows users to specify "sanity" (default), "all", or specific OCP versions like "ocp419", "ocp420". A new determine-prowjobs task evaluates the parameter and sets boolean results that control execution of each prowjob via when conditions. Assisted-by: Claude Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
|
Hi @ldoktor !
Based in your comment above I added a new commit that introduces a parameter called
ps: there is one situation that I need to test (actually these changes aren't tested yet...) which is when user selects only given ocp versions. For example, user select ocp417 but that version "run-after" ocp418...I don't know if tekton will assume the ocp418 ran even though it actually just skipped. |
|
/lgtm |
|
@wainersm: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/lgtm |
|
/hold |
Currently we are running all the prowjobs in a single phase.
In order to save resources as well as avoid the azure quota limit problem, I refactor the pipeline to run the jobs in phases and in groups.