Retry OCP CSR approval #2939

abays · 2025-04-30T15:17:07Z

The openshift_adm role's Check for pending certificate approval can sometimes fail when the cluster API isn't fully stable yet. Let's add a retry to avoid failing here unnecessarily. I'm assuming the actual Wait until the OpenShift cluster is stable [1] has to come after CSR approval because, without the approval, the cluster cannot reach full stability. If I'm wrong about that, perhaps we could just move the CSR task below [1].

[1]

ci-framework/roles/openshift_adm/tasks/wait_for_cluster.yml

Lines 63 to 73 in f8e17fe

    
           - name: Wait until OCP login succeeds. 
        
             community.okd.openshift_auth: 
        
               host: "{{ cifmw_openshift_api }}" 
        
               password: "{{ cifmw_openshift_password }}" 
        
               state: present 
        
               username: "{{ cifmw_openshift_user }}" 
        
               validate_certs: false 
        
             register: _oc_login_result 
        
             until: _oc_login_result.k8s_auth is defined 
        
             retries: "{{ cifmw_openshift_adm_retry_count }}" 
        
             delay: 2

openshift-ci · 2025-04-30T15:17:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

softwarefactory-project-zuul · 2025-04-30T17:06:44Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/a7d86dde28d84f3b93d2b7fd72d2c6bb

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 48m 24s
❌ podified-multinode-edpm-deployment-crc RETRY_LIMIT in 9m 12s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 30m 54s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 46s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 26s
✔️ build-push-container-cifmw-client SUCCESS in 17m 42s

abays · 2025-05-01T13:06:10Z

recheck

danpawlik · 2025-05-05T07:13:07Z

surprising, due it should not happening when etcd is on ramdisk. Does it happening when CI job has nested crc or it also happening when crc-cloud/crc-extracted?

abays · 2025-05-05T08:12:07Z

surprising, due it should not happening when etcd is on ramdisk. Does it happening when CI job has nested crc or it also happening when crc-cloud/crc-extracted?

It happens in my local env when I run the reproducer, every time on the first attempt. Then if I run the reproducer again (without cleaning), it succeeds. In my testing, adding the retries allows it to succeed on the first run.

abays · 2025-05-05T15:26:33Z

surprising, due it should not happening when etcd is on ramdisk. Does it happening when CI job has nested crc or it also happening when crc-cloud/crc-extracted?

Also, I am not using CRC with CIFMW. This is the dev-scripts path.

leifmadsen · 2025-05-06T12:48:45Z

I run into this issue every time I deploy with CIFMW. I have to wait for it to fail then re-run my deployment. It'd be nice to see something like this land so I don't have to double-run each deployment.

Retry OCP CSR approval

23d2ff7

abays requested a review from a team as a code owner April 30, 2025 15:17

github-actions bot added the Ready For Review label May 1, 2025

dasm approved these changes May 6, 2025

View reviewed changes

dasm merged commit 146a068 into openstack-k8s-operators:main May 6, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Retry OCP CSR approval #2939

Retry OCP CSR approval #2939

Uh oh!

abays commented Apr 30, 2025

Uh oh!

openshift-ci bot commented Apr 30, 2025

Uh oh!

softwarefactory-project-zuul bot commented Apr 30, 2025

Uh oh!

abays commented May 1, 2025

Uh oh!

danpawlik commented May 5, 2025

Uh oh!

abays commented May 5, 2025

Uh oh!

abays commented May 5, 2025

Uh oh!

leifmadsen commented May 6, 2025

Uh oh!

Uh oh!

Uh oh!

	- name: Wait until OCP login succeeds.
	community.okd.openshift_auth:
	host: "{{ cifmw_openshift_api }}"
	password: "{{ cifmw_openshift_password }}"
	state: present
	username: "{{ cifmw_openshift_user }}"
	validate_certs: false
	register: _oc_login_result
	until: _oc_login_result.k8s_auth is defined
	retries: "{{ cifmw_openshift_adm_retry_count }}"
	delay: 2

Retry OCP CSR approval #2939

Retry OCP CSR approval #2939

Uh oh!

Conversation

abays commented Apr 30, 2025

Uh oh!

openshift-ci bot commented Apr 30, 2025

Uh oh!

softwarefactory-project-zuul bot commented Apr 30, 2025

Uh oh!

abays commented May 1, 2025

Uh oh!

danpawlik commented May 5, 2025

Uh oh!

abays commented May 5, 2025

Uh oh!

abays commented May 5, 2025

Uh oh!

leifmadsen commented May 6, 2025

Uh oh!

Uh oh!

Uh oh!