|
| 1 | +# Azure Compute Gallery Image Testing |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +This repository includes a GitHub Actions workflow for post-publish sanity-testing AlmaLinux OS image versions in an Azure Compute Gallery. The workflow launches a fresh VM from a given gallery image version, runs a small set of release / arch / disk / `dnf` assertions over SSH, collects the installed-package list, tears the VM and its auto-created peers down on `always()`, and posts a Mattermost summary. |
| 6 | + |
| 7 | +It is the Azure counterpart of [`OCI_TEST.md`](OCI_TEST.md). |
| 8 | + |
| 9 | +## Files |
| 10 | + |
| 11 | +### `.github/workflows/azure-test.yml` |
| 12 | + |
| 13 | +Workflow for validating a Compute Gallery image version end-to-end. |
| 14 | + |
| 15 | +**What it does:** |
| 16 | +- Accepts a `compute_gallery_path` of the form `gallery_name/vm_image_definition/vm_image_version` (e.g. `almalinux/almalinux-9-gen2/9.7.2026050101`) |
| 17 | +- Resolves the gallery image-version resource ID and source VHD URI via `az sig image-version show` |
| 18 | +- Reverse-engineers the architecture from the source VHD filename using the same regex pair as [`AZURE_GALLERY.md`](AZURE_GALLERY.md) (so any image definition that release publishes is automatically supported) |
| 19 | +- Generates an ephemeral ed25519 SSH keypair, creates a test VM with `az vm create --nsg-rule SSH`, waits for SSH, runs the assertions, then deletes the VM, OS disk, NIC, public IP, and NSG by their auto-generated names |
| 20 | +- Uploads the package list as a workflow artifact |
| 21 | +- Sends a Mattermost notification with portal links to the gallery image and the (now-deleted) test VM |
| 22 | + |
| 23 | +**Usage:** |
| 24 | +``` |
| 25 | +Trigger via GitHub UI: Actions → Azure: Test Image |
| 26 | +
|
| 27 | +Inputs: |
| 28 | + - compute_gallery_path: gallery_name/vm_image_definition/vm_image_version |
| 29 | + (e.g. almalinux/almalinux-9-gen2/9.7.2026050101) |
| 30 | + - notify_mattermost: Send notification to Mattermost (default: true) |
| 31 | +``` |
| 32 | + |
| 33 | +The release workflow [`azure-to-gallery.yml`](AZURE_GALLERY.md) emits a structured `- Created: '<gallery>/<def>/<ver>'` line for every uploaded image-version, so the Mattermost release notification ends with a copy-pasteable `compute_gallery_path` for this workflow. |
| 34 | + |
| 35 | +## Required GitHub Configuration |
| 36 | + |
| 37 | +### Secrets |
| 38 | +| Secret | Description | |
| 39 | +|--------|-------------| |
| 40 | +| `AZURE_CLIENT_ID` | Azure service principal client ID | |
| 41 | +| `AZURE_TENANT_ID` | Azure tenant ID | |
| 42 | +| `AZURE_SUBSCRIPTION_ID` | Azure subscription ID | |
| 43 | +| `MATTERMOST_WEBHOOK_URL` | Mattermost incoming webhook URL | |
| 44 | + |
| 45 | +### Variables (`vars.*`) |
| 46 | +| Variable | Description | |
| 47 | +|----------|-------------| |
| 48 | +| `MATTERMOST_CHANNEL` | Mattermost channel for notifications | |
| 49 | + |
| 50 | +### GitHub Permissions |
| 51 | +The workflow requires: |
| 52 | +- `id-token: write` — for Azure OIDC authentication via `azure/login@v3` |
| 53 | +- `contents: read` — for repository checkout |
| 54 | + |
| 55 | +### Workflow-level `env` |
| 56 | +Resource group and region are pinned at the workflow level (matching the convention in `tools/azure_uploader.sh`): |
| 57 | +| Env | Value | |
| 58 | +|-----|-------| |
| 59 | +| `RESOURCE_GROUP` | `rg-alma-images` (holds both the gallery and the test VM) | |
| 60 | +| `AZURE_LOCATION` | `East US` | |
| 61 | +| `AZURE_PORTAL_BASE_URL` | `https://portal.azure.com/#@/resource` | |
| 62 | +| `SSH_USER` (job-level) | `almalinux` | |
| 63 | + |
| 64 | +## Required Azure RBAC |
| 65 | + |
| 66 | +The OIDC service principal behind `AZURE_CLIENT_ID` needs the following actions, assigned at the `rg-alma-images` resource-group scope: |
| 67 | + |
| 68 | +``` |
| 69 | +Microsoft.Compute/galleries/images/read |
| 70 | +Microsoft.Compute/virtualMachines/write |
| 71 | +Microsoft.Compute/virtualMachines/delete |
| 72 | +Microsoft.Compute/virtualMachines/deletePreservedOSDisk/action |
| 73 | +Microsoft.Compute/disks/delete |
| 74 | +Microsoft.Network/networkInterfaces/write |
| 75 | +Microsoft.Network/networkInterfaces/join/action |
| 76 | +Microsoft.Network/networkInterfaces/delete |
| 77 | +Microsoft.Network/networkSecurityGroups/read |
| 78 | +Microsoft.Network/networkSecurityGroups/write |
| 79 | +Microsoft.Network/networkSecurityGroups/join/action |
| 80 | +Microsoft.Network/networkSecurityGroups/delete |
| 81 | +Microsoft.Network/publicIPAddresses/read |
| 82 | +Microsoft.Network/publicIPAddresses/write |
| 83 | +Microsoft.Network/publicIPAddresses/join/action |
| 84 | +Microsoft.Network/publicIPAddresses/delete |
| 85 | +Microsoft.Network/virtualNetworks/write |
| 86 | +Microsoft.Network/virtualNetworks/subnets/join/action |
| 87 | +Microsoft.Resources/deployments/read |
| 88 | +Microsoft.Resources/deployments/write |
| 89 | +Microsoft.Resources/deployments/operationStatuses/read |
| 90 | +``` |
| 91 | + |
| 92 | +The same list is duplicated as a comment in the workflow header so a future maintainer composing a least-privilege custom role doesn't have to rediscover it by trial-and-error dispatches. |
| 93 | + |
| 94 | +## Compute Gallery Path Parsing |
| 95 | + |
| 96 | +The single workflow input is split on `/` into three components, then the version is split on `.` into Major/Minor/Patch shape: |
| 97 | + |
| 98 | +| Shape | Example | `ALMA_VERSION` | `DATESTAMP_ITERATION` | `RELEASE_STRING` | |
| 99 | +|-------|---------|----------------|----------------------|-----------------| |
| 100 | +| Stable AlmaLinux | `almalinux/almalinux-9-gen2/9.7.2026050101` | `9.7` | `2026050101` | `AlmaLinux release 9.7` | |
| 101 | +| Stable AlmaLinux 10 | `almalinux_ci/almalinux-ci-10-arm64-gen2/10.1.202605020` | `10.1` | `202605020` | `AlmaLinux release 10.1` | |
| 102 | +| Kitten 10 | `almalinux_ci/almalinux-ci-kitten-10-x64-gen2/10.20260501.0` | `10` | `20260501.0` | `AlmaLinux Kitten release 10` | |
| 103 | + |
| 104 | +A `*kitten*` branch in the parse step handles the Kitten `Major.Datestamp.Iteration` shape (no minor); stable AlmaLinux uses `Major.Minor.Patch`. |
| 105 | + |
| 106 | +`CUSTOM_IMAGE_NAME` (used as the artifact name and notification label) is derived from the source VHD filename without the `.vhd` extension — so it matches the artifact name produced by `azure-to-gallery.yml`. |
| 107 | + |
| 108 | +## Architecture Detection |
| 109 | + |
| 110 | +Architecture is **not** mapped from the gallery name; it is reverse-engineered from the source VHD filename returned by `az sig image-version show`. The workflow tries both regexes the release path uses: |
| 111 | + |
| 112 | +```bash |
| 113 | +regex_azure='-([0-9]+\.?[0-9]*)-([0-9]{8,9}(\.[0-9])?).*\.(x86_64|aarch64|arm64)' |
| 114 | +regex_simple='almalinux-([0-9]+\.[0-9]+)-(x86_64|aarch64|arm64)\.([0-9]{8})' |
| 115 | +``` |
| 116 | + |
| 117 | +`arm64` returned by the regex is normalised to `aarch64` so the in-VM `rpm -q ... | grep <arch>` test keeps working. Architecture then maps to a default Azure VM size: |
| 118 | + |
| 119 | +| Architecture | VM size | |
| 120 | +|---|---| |
| 121 | +| `x86_64` | `Standard_D2as_v5` | |
| 122 | +| `aarch64` | `Standard_D2ps_v5` | |
| 123 | + |
| 124 | +The same defaults are used for Gen1 and 64K-page-size variants until a need to differentiate them surfaces. |
| 125 | + |
| 126 | +## Test Assertions |
| 127 | + |
| 128 | +Once SSH is reachable on the VM, the following checks run in sequence (failure of any aborts the workflow): |
| 129 | + |
| 130 | +1. **AlmaLinux release** — `grep '<RELEASE_STRING>' /etc/almalinux-release` |
| 131 | +2. **Release package** — `rpm -qf /etc/almalinux-release` (resolved on the VM, so it works for both stable and Kitten release packages) |
| 132 | +3. **System architecture** — `rpm -q --qf='%{ARCH}\n' <RELEASE_PACKAGE> | grep '<ALMA_ARCH>'` |
| 133 | +4. **Disk and filesystems** — `lsblk` listing |
| 134 | +5. **Root filesystem resize** — root must be ≥ 98 GiB (the OS-disk-size-gb passed to `az vm create` is 100 GiB) |
| 135 | +6. **Updates available** — `sudo dnf check-update` (exit code `100` is treated as success — it just means updates are pending) |
| 136 | +7. **Installed-package list** — `rpm -qa --queryformat '%{NAME}\n' | sort > /tmp/<CUSTOM_IMAGE_NAME>.txt`, then SCP'd back and uploaded as a workflow artifact |
| 137 | + |
| 138 | +## Workflow Process |
| 139 | + |
| 140 | +```mermaid |
| 141 | +graph TD |
| 142 | + A[Trigger Workflow] --> V[Validate compute_gallery_path] |
| 143 | + V --> P[Parse Compute Gallery Path] |
| 144 | + P --> D[Install dependencies — netcat-openbsd] |
| 145 | + D --> L[Azure login — azure/login@v3] |
| 146 | + L --> R[Resolve gallery image version + architecture<br/>az sig image-version show + jq from VHD URI] |
| 147 | + R --> K[Generate ephemeral SSH keypair — ed25519] |
| 148 | + K --> C[Launch test VM — az vm create --nsg-rule SSH] |
| 149 | + C --> IP[Resolve VM public IP] |
| 150 | + IP --> W[Wait for SSH — 60 × 10 s nc] |
| 151 | + W --> T[Run image tests — release/arch/disk/dnf/packages] |
| 152 | + T --> U[Upload packages list artifact] |
| 153 | + U --> S[Job summary — portal links] |
| 154 | + S --> CL[Terminate test VM<br/>VM + OS disk + NIC + Public IP + NSG] |
| 155 | + CL --> N[Send Mattermost notification] |
| 156 | +``` |
| 157 | + |
| 158 | +## VM Lifecycle |
| 159 | + |
| 160 | +The VM is named `azure-test-${ALMA_VERSION}-${DATESTAMP_ITERATION}-${ALMA_ARCH}-${GITHUB_RUN_ID}` (Azure VM names allow dots, so the version dot is preserved as-is for grep-ability in audit logs). `--nsg-rule SSH` opens port 22 from anywhere for the lifetime of the VM, which is acceptable because the VM is short-lived and the SSH key is ephemeral. |
| 161 | + |
| 162 | +The `Terminate test VM` step runs under `if: always() && env.VM_NAME != ''` and deletes — each call wrapped in `|| true` so cleanup always advances: |
| 163 | + |
| 164 | +| Resource | Auto-generated name | `az` command | |
| 165 | +|----------|--------------------|--------------| |
| 166 | +| VM | `${VM_NAME}` | `az vm delete --yes --force-deletion true` | |
| 167 | +| OS disk | resolved from `az vm show storageProfile.osDisk.name` | `az disk delete --yes --no-wait` | |
| 168 | +| NIC | `${VM_NAME}VMNic` | `az network nic delete --no-wait` | |
| 169 | +| Public IP | `${VM_NAME}PublicIP` | `az network public-ip delete --no-wait` | |
| 170 | +| NSG | `${VM_NAME}NSG` | `az network nsg delete --no-wait` | |
| 171 | + |
| 172 | +The `set -e` step still runs all six `az` calls regardless of any one failing. |
| 173 | + |
| 174 | +## Testing |
| 175 | + |
| 176 | +1. **First test against an aarch64 release** (private CI gallery): |
| 177 | + ``` |
| 178 | + compute_gallery_path = almalinux_ci/almalinux-ci-10-arm64-gen2/10.1.202605020 |
| 179 | + ``` |
| 180 | +2. **First test against an x86_64 stable release** (public gallery): |
| 181 | + ``` |
| 182 | + compute_gallery_path = almalinux/almalinux-9-gen2/9.7.2026050101 |
| 183 | + ``` |
| 184 | +3. **Kitten release**: |
| 185 | + ``` |
| 186 | + compute_gallery_path = almalinux_ci/almalinux-ci-kitten-10-x64-gen2/10.20260501.0 |
| 187 | + ``` |
| 188 | + |
| 189 | +After each run, verify cleanup with: |
| 190 | +```bash |
| 191 | +az resource list -g rg-alma-images --query "[?contains(name, '<run_id>')]" |
| 192 | +# Expected: [] |
| 193 | +``` |
| 194 | + |
| 195 | +## Troubleshooting |
| 196 | + |
| 197 | +### Common Issues |
| 198 | + |
| 199 | +1. **"Invalid Compute Gallery Path" validation error** |
| 200 | + - The regex requires three slash-separated parts and a three-part dot version. Kitten paths (`gallery/def/Major.Datestamp.Iteration`) and stable paths (`gallery/def/Major.Minor.Patch`) are both accepted. |
| 201 | + |
| 202 | +2. **"Gallery image version not found"** |
| 203 | + - The `az sig image-version show` call returned a non-zero exit. Confirm the path with `az sig image-version list -g rg-alma-images -r <gallery> -i <def>`. |
| 204 | + |
| 205 | +3. **"Could not extract image-version metadata"** |
| 206 | + - The `az` call succeeded but `jq` could not find `id` or the source VHD URI under either `.storageProfile.osDiskImage.source.uri` or `.properties.storageProfile.osDiskImage.source.uri`. The raw JSON is dumped to the run log for inspection. |
| 207 | + |
| 208 | +4. **"Could not parse architecture from VHD source"** |
| 209 | + - The source VHD filename did not match either `regex_azure` or `regex_simple`. Inspect the VHD URI in the run log; the parsing rule lives in [`AZURE_GALLERY.md`](AZURE_GALLERY.md) and may need to be extended for the new shape on the release path first. |
| 210 | + |
| 211 | +5. **"AuthorizationFailed" on `Microsoft.Compute/galleries/images/read`** |
| 212 | + - The service principal lacks the read permission on the gallery. Grant the 21 RBAC actions listed above at `rg-alma-images` scope (or attach a custom role with the same set). |
| 213 | + |
| 214 | +6. **"SSH did not become reachable within 10 minutes"** |
| 215 | + - The VM came up but SSH never opened on port 22 from the runner. Possible causes: NSG rule didn't apply (rare), cloud-init not finished, SSH user wrong (the workflow assumes `almalinux` — older AlmaLinux Azure images sometimes only accept `azureuser`). |
| 216 | + |
| 217 | +7. **"Root filesystem resize check failed"** |
| 218 | + - The root filesystem on the test VM did not auto-grow to ≥ 98 GiB. Indicates a `cloud-init` / `growpart` regression in the published image. |
| 219 | + |
| 220 | +8. **`dnf check-update` exits with non-100, non-0 code** |
| 221 | + - Repo metadata fetch failure or signed metadata mismatch. Re-run; if persistent, check that `RELEASE_VERSION` repo data matches the image's release. |
| 222 | + |
| 223 | +### Linter Warnings |
| 224 | + |
| 225 | +GitHub Actions YAML linters may show "Context access might be invalid" warnings for environment variables set via `$GITHUB_ENV`. These are false positives — the workflow functions correctly. |
| 226 | + |
| 227 | +## Support |
| 228 | + |
| 229 | +- Azure Portal: https://portal.azure.com |
| 230 | +- Azure Compute Gallery docs: https://learn.microsoft.com/en-us/azure/virtual-machines/azure-compute-gallery |
| 231 | +- AlmaLinux Cloud SIG Chat: https://chat.almalinux.org/almalinux/channels/sigcloud |
| 232 | +- Workflow run logs: GitHub Actions tab in the repository |
0 commit comments