Skip to content

Shard Azure MAPI regression job #64959

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mdbooth
Copy link
Contributor

@mdbooth mdbooth commented May 15, 2025

This job is currently running for 7 hours.

@openshift-ci openshift-ci bot requested review from nrb and theobarberbany May 15, 2025 08:17
@mdbooth
Copy link
Contributor Author

mdbooth commented May 15, 2025

/pj-rehearse

@openshift-ci-robot
Copy link
Contributor

@mdbooth: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

This job is currently running for 7 hours.
@mdbooth mdbooth force-pushed the regression-clusterinfra-azure-ipi-mapi-shard branch from 07b29fb to 68603f2 Compare May 15, 2025 08:29
@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@mdbooth: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-1of3 openshift/machine-api-provider-azure presubmit Presubmit changed
pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-2of3 openshift/machine-api-provider-azure presubmit Presubmit changed
pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-3of3 openshift/machine-api-provider-azure presubmit Presubmit changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@mdbooth
Copy link
Contributor Author

mdbooth commented May 15, 2025

/pj-rehearse

@openshift-ci-robot
Copy link
Contributor

@mdbooth: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

Copy link
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

Feel free to ack the pj once happy

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 15, 2025
Copy link
Contributor

openshift-ci bot commented May 15, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: damdo, mdbooth

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 15, 2025
@mdbooth
Copy link
Contributor Author

mdbooth commented May 15, 2025

/hold

Given that the 3 shards are all still running after 4 hours, I assume this doesn't work on this job.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 15, 2025
@theobarberbany
Copy link
Contributor

theobarberbany commented May 15, 2025

It looks like the contents of the job (regression tests?) aren't configured to shard - this has run the whole regression 3 times in parallel

- as: regression-clusterinfra-azure-ipi-mapi
optional: true
run_if_changed: ^(go\.mod|go\.sum)$
steps:
cluster_profile: azure4
env:
E2E_RUN_TAGS: '@mapi'
TEST_FILTERS: NonHyperShiftHOST
TEST_FILTERS_CLUSTERINFRASTRUCTURE: periodic&&!qe-only&&mapi
TEST_SCENARIOS: Cluster_Infrastructure MAPI
TEST_TIMEOUT: "35"
test:
- chain: openshift-e2e-test-clusterinfra-qe-regression
workflow: cucushift-installer-rehearse-azure-ipi
timeout: 7h0m0s
defines the actual job which is run, which looks like it uses the cucushift-installer-rehearse-azure-ipi workflow.

  test:
    - chain: openshift-e2e-test-clusterinfra-qe-regression
    workflow: cucushift-installer-rehearse-azure-ipi

I'm not sure what chain does.

edit: here's the chain definition, it's a chain of workflows? it breaks down how the test is run: https://steps.ci.openshift.org/chain/openshift-e2e-test-clusterinfra-qe-regression

I think this is what would need to support sharding.

https://steps.ci.openshift.org/reference/openshift-e2e-test supports sharding (takes SHARD_ARGS) maybe we can do something similar

@shellyyang1989
Copy link
Contributor

cc @sunzhaohua2 for awareness

@miyadav
Copy link
Member

miyadav commented May 21, 2025

/pj-rehearse

@openshift-ci-robot
Copy link
Contributor

@miyadav: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@miyadav
Copy link
Member

miyadav commented May 21, 2025

not sure if it picked up changes when i triggered, still seeing it taking time 🤔

@miyadav
Copy link
Member

miyadav commented May 22, 2025

/pj-rehearse

@openshift-ci-robot
Copy link
Contributor

@miyadav: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@miyadav
Copy link
Member

miyadav commented May 22, 2025

doing more testing in separate PR (using a step other than the chain used in this PR as the chain contains other repos test as well ) , doesn't look like it worked at the moment.

@miyadav
Copy link
Member

miyadav commented May 22, 2025

from the running logs of the test pods, here , sharding looking good .. once tests end we can confirm that.

Copy link
Contributor

openshift-ci bot commented May 22, 2025

@mdbooth: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/openshift/machine-api-provider-azure/main/regression-clusterinfra-azure-ipi-mapi-1of3 68603f2 link unknown /pj-rehearse pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-1of3
ci/rehearse/openshift/machine-api-provider-azure/main/regression-clusterinfra-azure-ipi-mapi-2of3 68603f2 link unknown /pj-rehearse pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-2of3
ci/rehearse/openshift/machine-api-provider-azure/main/regression-clusterinfra-azure-ipi-mapi-3of3 68603f2 link unknown /pj-rehearse pull-ci-openshift-machine-api-provider-azure-main-regression-clusterinfra-azure-ipi-mapi-3of3

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants