Skip to content

Commit f36c8fb

Browse files
committed
feat(ci): capture VM screenshots on failed vagrant_virtualbox builds
When the virtualbox-iso packer build fails or hangs (e.g. boot_command keystrokes missing the GRUB menu on a slow host), the only way to see what the VM was doing was to attach to its console interactively - which needs VRDP and therefore the Oracle Extension Pack, whose PUEL license does not cover organizational CI use. Add two steps to shared-steps after the packer build, both gated on `failure() && inputs.type == 'vagrant_virtualbox'`: - Capture VM screen on failure: iterate `VBoxManage list runningvms` and save a `controlvm <vm> screenshotpng` PNG per running VM to /tmp/vm-screenshots/. `screenshotpng` is base VirtualBox - no Extension Pack required - and so is licensing-clean for CI. Skips quietly if VBoxManage is absent. - Store VM screenshots as artifact: upload the PNGs as `vm-screenshots-<type>-<variant>-<arch>` (unique per matrix leg), with if-no-files-found: ignore so failures with no live VM add no noise. A failed run now leaves a workflow artifact showing the exact screen the VM was stuck on - the same information an interactive RDP session would have shown, without any interactive access to the runner. BUILD_VAGRANT.md gains a "Debugging: watching the VirtualBox VM console" section covering both paths: the new on-failure screenshot artifact (base VirtualBox, licensing-clean) and interactive VRDP (requires the Oracle Extension Pack - with the PUEL licensing caveat, the version-matched install recipe, the VM-start timing gotcha, and the SSH-tunnel + RDP recipe). Troubleshooting item 3 (VirtualBox "Waiting for SSH" hang) now points at the artifact and the new section, and names the GRUB boot_command timing race as the common root cause.
1 parent 8d2cfb4 commit f36c8fb

2 files changed

Lines changed: 78 additions & 1 deletion

File tree

.github/actions/shared-steps/action.yml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -285,6 +285,33 @@ runs:
285285
# PACKER_LOG=1
286286
sudo sh -c "/usr/bin/packer build ${{ env.PACKER_OPTS }} -only=${{ env.packer_source }} ."
287287
288+
# Debug aid for hung/failed VirtualBox builds (e.g. boot_command
289+
# keystrokes missing the GRUB menu): capture a PNG of every VM that is
290+
# still running when the job fails. Works in base VirtualBox - no
291+
# Oracle Extension Pack required - and therefore is licensing-clean
292+
# for CI use, unlike VRDP.
293+
- name: Capture VM screen on failure
294+
if: failure() && inputs.type == 'vagrant_virtualbox'
295+
shell: bash
296+
run: |
297+
# Capture VM screen on failure
298+
command -v VBoxManage >/dev/null 2>&1 || { echo "[Debug] VBoxManage not present - skipping screenshots"; exit 0; }
299+
mkdir -p /tmp/vm-screenshots
300+
for vm in $(sudo VBoxManage list runningvms | grep -oE '"[^"]+"' | tr -d '"'); do
301+
echo "[Debug] capturing screen of '${vm}'"
302+
sudo VBoxManage controlvm "${vm}" screenshotpng "/tmp/vm-screenshots/${vm}.png" || true
303+
done
304+
sudo chown -R "$(id -u):$(id -g)" /tmp/vm-screenshots || true
305+
ls -la /tmp/vm-screenshots/ || true
306+
307+
- uses: actions/upload-artifact@v7
308+
name: Store VM screenshots as artifact
309+
if: failure() && inputs.type == 'vagrant_virtualbox'
310+
with:
311+
name: vm-screenshots-${{ inputs.type }}-${{ inputs.variant }}-${{ inputs.arch }}
312+
path: /tmp/vm-screenshots/*.png
313+
if-no-files-found: ignore
314+
288315
- name: Locate image file, generate checksum
289316
shell: bash
290317
run: |

BUILD_VAGRANT.md

Lines changed: 51 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -253,6 +253,56 @@ Note: the `vagrant up` may not work on stock GH runner, so the VirtualBox *test*
253253
254254
`hyperv-build.yml` hard-codes `run_test: 'false'` — see [Job layout → hyperv-build.yml](#hyperv-buildyml) above.
255255
256+
## Debugging: watching the VirtualBox VM console
257+
258+
When the `virtualbox-iso` Packer build hangs (most commonly at `Waiting for SSH to become available...`), seeing the VM's screen usually identifies the cause immediately — e.g. the VM parked at the GRUB menu because the `boot_command` keystrokes were typed before the menu was up.
259+
260+
### Automatic: screenshot artifact on failure
261+
262+
On every **failed** `vagrant_virtualbox` build, the shared composite action captures a PNG of each still-running VM (`VBoxManage controlvm <vm> screenshotpng`) and uploads it as the `vm-screenshots-vagrant_virtualbox-<variant>-<arch>` workflow artifact. This needs nothing beyond base VirtualBox and works on any runner. Check the artifact first before reaching for interactive access.
263+
264+
A one-off screenshot can also be taken manually on the build host while the VM is running:
265+
266+
```bash
267+
sudo VBoxManage list runningvms
268+
sudo VBoxManage controlvm "<vm-name>" screenshotpng /tmp/screen.png
269+
```
270+
271+
### Interactive: VRDP (requires the Oracle Extension Pack)
272+
273+
Packer starts each VM with VRDE enabled and prints the console address in the build log:
274+
275+
```
276+
==> virtualbox-iso.almalinux_…: view the screen of the VM, connect via VRDP without a password to
277+
==> virtualbox-iso.almalinux_…: rdp://127.0.0.1:5932
278+
```
279+
280+
That endpoint only works if the **Oracle VirtualBox Extension Pack** is installed on the build host — base VirtualBox has no RDP server, and with the pack missing the VM starts normally but nothing ever listens on the advertised port (the only symptom is a silent `Connection refused`).
281+
282+
> **Licensing.** The Extension Pack is under Oracle's PUEL license: free for **personal and educational use only**. Organizational / CI use requires a commercial Oracle license — which is why the workflows do **not** install it, and why it must stay out of org-level automation. Installing it manually on your own debug runner for occasional personal use is a different situation; make your own call.
283+
284+
Install (version must match the VirtualBox version exactly):
285+
286+
```bash
287+
VER=$(VBoxManage --version | sed 's/r.*//') # e.g. 7.1.18
288+
PACK="Oracle_VirtualBox_Extension_Pack-${VER}.vbox-extpack"
289+
curl -fsSLO "https://download.virtualbox.org/virtualbox/${VER}/${PACK}"
290+
echo y | sudo VBoxManage extpack install --replace "${PACK}"
291+
sudo VBoxManage list extpacks # must show 'Usable: true'
292+
```
293+
294+
Notes:
295+
296+
- VRDE loads at **VM start** — a VM already running when the pack was installed will not start listening; only the next build picks it up.
297+
- The VRDP server binds to the build host's loopback (`127.0.0.1:59xx`, port assigned per build). From a workstation, tunnel it over SSH and connect with any RDP client (no credentials — auth type is `null`):
298+
299+
```bash
300+
ssh -N -L 5932:127.0.0.1:5932 <user>@<build-host>
301+
# then RDP to localhost:5932
302+
```
303+
304+
- Find the current port on the host with `sudo VBoxManage showvminfo "<vm-name>" | grep VRDE`.
305+
256306
## S3 upload layout
257307

258308
Mirrors the cloud-image workflows:
@@ -298,7 +348,7 @@ All uploaded objects are tagged `public=yes`. Downstream publishing to Vagrant C
298348

299349
1. **Vagrant job sits in `Queued` forever** — the label on the manual runner doesn't match `matrix_sh` exactly. Double-check the `--labels` argument to `./config.sh` and that `--no-default-labels` was passed.
300350
2. **`vagrant up` fails with SSH timeout (libvirt/VirtualBox/VMware)** — known flake; re-run the job.
301-
3. The `packer build` for **VirtualBox** sits on **`Waiting for SSH to become available...`**, and then fails in an hour because of **`Timeout waiting for SSH.`**. That's a known issue. Re-run the job.
351+
3. The `packer build` for **VirtualBox** sits on **`Waiting for SSH to become available...`**, and then fails in an hour because of **`Timeout waiting for SSH.`**. That's a known issue. Re-run the job. To see what the VM was actually doing, check the `vm-screenshots-…` artifact of the failed run, or watch the console live — see [Debugging: watching the VirtualBox VM console](#debugging-watching-the-virtualbox-vm-console). A common root cause is the `boot_command` keystrokes landing before the GRUB menu is up on a slow host (fixed by the larger `boot_wait` default; can be raised further with `-var boot_wait=45s`).
302352
4. **VirtualBox or VMware build fails with "VT-x is disabled"** — KVM is still loaded. The workflow tries to unload `kvm_intel` / `kvm_amd` automatically; if another process is holding the modules open the unload is a no-op. Stop any nested VMs on the host and re-run.
303353
5. **VMware Install step fails with `No such file or directory` on the bundle tarball** — the VMware Workstation bundle isn't staged at `/actions-runner/_work/cloud-images/VMware-Workstation-Full-<ws_version>-<ws_build>.x86_64.bundle.tar`. Download it once (matching the `ws_version` / `ws_build` values at the top of the `Install VMware` step in [`.github/actions/shared-steps/action.yml`](.github/actions/shared-steps/action.yml)) and place it at that path on the runner.
304354
6. **`hyperv-build.yml` can't find the Packer template** — double-check the `version_major` input; Hyper-V builds only the variants listed in the table above.

0 commit comments

Comments
 (0)