Skip to content

Commit 8eab954

Browse files
committed
[docs] Add zuul retention docs
1 parent a72138d commit 8eab954

File tree

4 files changed

+32
-2
lines changed

4 files changed

+32
-2
lines changed

docs/dictionary/en-custom.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ arx
1919
arxcruz
2020
auth
2121
authfile
22+
autohold
23+
autoholds
2224
autoscale
2325
autostart
2426
awk
@@ -349,6 +351,7 @@ nncp
349351
nobuild
350352
nodeexporter
351353
nodenetworkconfigurationpolicy
354+
nodepool
352355
nodeps
353356
nodeset
354357
nodesets

docs/source/index.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,10 +73,10 @@ In case of emergency, or if we didn't come back to you in a reasonable time (exp
7373

7474
.. toctree::
7575
:maxdepth: 1
76-
:caption: Cookbooks
76+
:caption: Zuul
7777
:glob:
7878

79-
cookbooks/*
79+
zuul/*
8080

8181
.. toctree::
8282
:maxdepth: 1
File renamed without changes.
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Autoholds resources retention mechanism
2+
3+
4+
## Why is it needed?
5+
6+
Zuul uses nodepool to manage the life-cycle of the instances required to run a job, also known as the nodeset. Zuul only manages the default network for initial access, all other networks required for our complex testing are managed via the CI-Framework, external to Zuul.
7+
8+
The ci-framework uses a special set of [playbooks](https://review.rdoproject.org/r/plugins/gitiles/config.git/+/refs/heads/master/playbooks/crc/) to create network resources around the nodepool deployed instances. Those resources are cleaned up on each run, no matter how the run finishes, leading to an environment that may be useless from a debugging perspective.
9+
10+
The autohold retention mechanism checks with Zuul to see if the run has an autohold request and, in that case, skips the cleanup process so the network resources remains.
11+
12+
For the skipped network resources, a script managed by the infrastructure team cleans the resources periodically.
13+
14+
## Where is the code?
15+
16+
The code that handles skipping the cleanup in the framework [repo](https://github.com/openstack-k8s-operators/ci-framework/blob/main/ci/playbooks/multinode-autohold.yml).
17+
18+
19+
## How does it work?
20+
21+
Basically the [code](https://github.com/openstack-k8s-operators/ci-framework/blob/main/ci/playbooks/multinode-autohold.yml) checks against the Zuul API to see if there is an autohold request created for the run based on the information stored in the `zuul` Ansible variable.
22+
23+
The code uses the `krb_request` role that uses kerberos underneath if needed and if a kerberos token is present. If the Zuul API is not secured the method will not use any kind of authentication.
24+
25+
Some Zuul instances are not configured to use the executor as an API so, for those cases, the `zuul_autohold_endpoint` needs to be set to point to the autohold URL of the Zuul instance. If the variable is not present the URL is auto-generated assuming the API is reachable through the executor.
26+
27+
This check is only done if the job failed, if the job passed the autohold will not retain the instances so we follow the same approach with the network resources of cleaning them up before finishing the job.

0 commit comments

Comments
 (0)