Skip to content

Retry OCP CSR approval #2939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 6, 2025
Merged

Conversation

abays
Copy link
Contributor

@abays abays commented Apr 30, 2025

The openshift_adm role's Check for pending certificate approval can sometimes fail when the cluster API isn't fully stable yet. Let's add a retry to avoid failing here unnecessarily. I'm assuming the actual Wait until the OpenShift cluster is stable [1] has to come after CSR approval because, without the approval, the cluster cannot reach full stability. If I'm wrong about that, perhaps we could just move the CSR task below [1].

[1]

- name: Wait until OCP login succeeds.
community.okd.openshift_auth:
host: "{{ cifmw_openshift_api }}"
password: "{{ cifmw_openshift_password }}"
state: present
username: "{{ cifmw_openshift_user }}"
validate_certs: false
register: _oc_login_result
until: _oc_login_result.k8s_auth is defined
retries: "{{ cifmw_openshift_adm_retry_count }}"
delay: 2

@abays abays requested a review from a team as a code owner April 30, 2025 15:17
Copy link
Contributor

openshift-ci bot commented Apr 30, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/a7d86dde28d84f3b93d2b7fd72d2c6bb

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 48m 24s
podified-multinode-edpm-deployment-crc RETRY_LIMIT in 9m 12s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 30m 54s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 46s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 26s
✔️ build-push-container-cifmw-client SUCCESS in 17m 42s

@abays
Copy link
Contributor Author

abays commented May 1, 2025

recheck

@danpawlik
Copy link
Contributor

surprising, due it should not happening when etcd is on ramdisk. Does it happening when CI job has nested crc or it also happening when crc-cloud/crc-extracted?

@abays
Copy link
Contributor Author

abays commented May 5, 2025

surprising, due it should not happening when etcd is on ramdisk. Does it happening when CI job has nested crc or it also happening when crc-cloud/crc-extracted?

It happens in my local env when I run the reproducer, every time on the first attempt. Then if I run the reproducer again (without cleaning), it succeeds. In my testing, adding the retries allows it to succeed on the first run.

@abays
Copy link
Contributor Author

abays commented May 5, 2025

surprising, due it should not happening when etcd is on ramdisk. Does it happening when CI job has nested crc or it also happening when crc-cloud/crc-extracted?

Also, I am not using CRC with CIFMW. This is the dev-scripts path.

@leifmadsen
Copy link
Contributor

I run into this issue every time I deploy with CIFMW. I have to wait for it to fail then re-run my deployment. It'd be nice to see something like this land so I don't have to double-run each deployment.

@dasm dasm merged commit 146a068 into openstack-k8s-operators:main May 6, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants