-
-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Milestone
Description
Service(s)
cert.ci.jenkins.io
Summary
As per #5003, we have credits to spend in the Jenkins sponsored subscription.
Let's move the cert.ci.jenkins.io ephemeral agents in this new subscription, which includes the data transiting from/to the NAT gateway. It means the following costs will be moved to the new subscription (excerpt of the past 6 months):
Prerequisites:
- [Azure Sponsored Subscription 2026] Set up permissions #5005
- Check old "cleanup" PR or commits from 2025 former sponsored subscription
- Check current code (for both repositories azure and azure-net) to ensure we have the same objects and naming conventions (as the old cleanup might be stuck on former techniques or naming we dropped)
Task list:
- Azure Net: a new vnet + subnet is required for ephemeral VM agents. We should use the same pattern as last year, except that the controller might need to move in this vnet as well: we should increase its size compared to 2025 right at creation so we won't have vnet overlap or increase in the upcoming weeks
- Azure: the following resources are expected (same as old setup with 2025 subscription):
- data sources to the new vnet,subnet and their resource groups (RG)
- A new RG for the "non agent" resources. Usually we name it
xxx_ci_jenkins_io_controller_jenkins_sponsoredwithxxxthe specific name (certhere). Will also be used if we move the controller VM of course. - A new "azure-vm" module instantiation in this new subscription (to create the usual resources) - RG, storage, Network Security Group (NSG), etc.
- Missing permissions such as vnet reader
- Nit: I'm not sure why it's not in the module. Might be missing OR there might have been a reason 🤔
- A new User-Assigned Managed Identity (UAID) for the azure VM agents (required to be in the same subscription as the role assignment and their scopes) and its assignments to allow management by the controler Service Principal (SP) to allow writing to the buildreports file share for agents
- Nit: might be useful to have this UAID integrated into the module in the future as we want this by default
- Azure Container Registry (ACR) setup: a Private Endpoint (PE) in the agents subnet to reach the ACR's Private Link Service (PLS), the NSG rules associated with it
- Output the required values for cert.ci.jenkins.io JCasC Puppet setup (see below)
- Nit: can be done as a second non functional PR if need be
- Puppet: set up cert.ci controller to use the new vnet, subnet, their RG and the agent UAID (if I recall correctly, should be all)
- Tip: testing the values manually in the controller UI and triggering the "agent health" helps to verify the minimum is set up. If agent do not allocate after 2-3 min, then check the controller logs and correct discovered errors
- Once the manual tests are ok, puppet hieradata can be updated and deployed
- Setup VPN access (required to access agents with SSH bounce through the VPN):
- Add the new vnet in the VPN routes (Docker image, e.g. VPN client side)
- With the new image tagged, update it in puppet along with server side routes (automated PR recently fixed by Jay)
- Allocate an agent from cert.ci (pipeline replay) and verify you can SSH to it through your machine. If cannot access, then try through OpenVPN VM and compare.
- Verify ACR from a cert.ci.jenkins.io agent
- From an allocated agent (with SSH), check access with a
curl -v https://<acr DNS name>. Fix missing requirements based on eventual errors (DNS record absent from private network? TCP not able to establish connection? etc. - see https://docs.azure.cn/en-us/container-registry/container-registry-troubleshoot-access). - Reminder: ACR must stay private using PE/PLS. No network peering, no public access, because it cannot be authenticated (limitation of Docker/Podman
registry-mirror) - Once
curlallows reaching the ACR, check if Docker Engine is able to use it (needrootaccess to the agent, withjournalctl -u docker -f
- From an allocated agent (with SSH), check access with a
- Finally, cleanup: with cert.ci.jenkins.io using new subscription for agents
- Remove old agents resources in Azure (including
datasource unless used somewhere else) - Then remove old subnet/resources from Azure Net in the CDF subscription (does not cost much, but better to cleanup for clarity)
- Finally remove routes from OpenVPN
- Remove old agents resources in Azure (including
Reactions are currently unavailable
Pinned by lemeurherve
Pinned comment options
Remaining tasks before closing this issue:
-
Save as code the new packer image version to use in templates.
- Manually updated from 2.82.0 present only in the main gallery to 2.84.0 replicated in Sweden
- jenkins-infra/jenkins-infra#4735
- No ACR for now
-
Setup VPN access (required to access agents with SSH bounce through the VPN):
- Add the new vnet in the VPN routes (Docker image, e.g. VPN client side)
- With the new image tagged, update it in puppet along with server side routes (automated PR recently fixed by Jay)
- Allocate an agent from cert.ci (pipeline replay) and verify you can SSH to it through your machine. If …