Skip to content

Conversation

@wainersm
Copy link
Contributor

Currently we are running all the prowjobs in a single phase.

In order to save resources as well as avoid the azure quota limit problem, I refactor the pipeline to run the jobs in phases and in groups.

@openshift-ci openshift-ci bot requested review from jensfr and snir911 January 22, 2026 14:38
@vvoronko
Copy link

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 22, 2026
Currently it run all the jobs in parallel in a single phase. However,
some times we see bugs that affect OSC regardless the workload time or
OCP version, driving all jobs to fail equally, hence it is a waste of
resources. With this change we break the execution in two phases:

1) first run a sanity check job
2) On passing, run the remaining jobs in parallel

The cavet of that approach is that now it will take up to 8 hours to
finish all jobs.

Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Currently all remaining jobs are running in parallel after the sanity
check, which is going to require 9 runs from the common azure profile
and increase our odds of hitting the quota limit. So this is serializing
the execution further, to consume less quotas each time.

The new sequence of executions is:

sanity check (1 quota) -> ocp 4.19 and 4.20 (3 quotas) -> ocp 4.18 (3 quotas)
-> ocp 4.17 (3 quotas)

The caveat again is the increased time to complete the pipeline, now up
to 16 hours.

Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
@wainersm wainersm force-pushed the osc-test-fbc-integration_optimize branch from e7689a0 to 3d6cbf2 Compare January 23, 2026 03:26
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 23, 2026
Copy link
Contributor

@littlejawa littlejawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks @wainersm !

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 23, 2026
Copy link

@ldoktor ldoktor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, one thing to consider is to reorder the tests so they are ordered as executed. Also you might consider executing only part of the configurations and leave the thorough testing for weekly/monthly/prerelease testing (I mean run one with kata/peer/coco and the other ocp versions just one each)

Add a RUN_PROWJOBS array parameter to control which prowjobs run.
This allows users to specify "sanity" (default), "all", or specific
OCP versions like "ocp419", "ocp420". A new determine-prowjobs task
evaluates the parameter and sets boolean results that control execution
of each prowjob via when conditions.

Assisted-by: Claude
Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 23, 2026
@wainersm
Copy link
Contributor Author

Hi @ldoktor !

lgtm, one thing to consider is to reorder the tests so they are ordered as executed. Also you might consider executing only part of the configurations and leave the thorough testing for weekly/monthly/prerelease testing (I mean run one with kata/peer/coco and the other ocp versions just one each)

Based in your comment above I added a new commit that introduces a parameter called RUN_PROWJOBS where we will be able to better select groups of jobs. This is how I see us using it:

  • early in development cycle we run the sanity checks only, that include one job for kata and peerpods each in the latest supported OCP version
  • if we need to test on specific ocp versions, we just enable it
  • as we approach the end of development cycle, we should enable to trigger all job. IMPORTANT: the trigger "all" will still run the jobs in group.

ps: there is one situation that I need to test (actually these changes aren't tested yet...) which is when user selects only given ocp versions. For example, user select ocp417 but that version "run-after" ocp418...I don't know if tekton will assume the ocp418 ran even though it actually just skipped.

@tbuskey
Copy link
Contributor

tbuskey commented Jan 23, 2026

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 23, 2026
@openshift-ci
Copy link

openshift-ci bot commented Jan 23, 2026

@wainersm: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@vvoronko
Copy link

/lgtm

@wainersm
Copy link
Contributor Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants