You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(ci): capture VM screenshots on failed vagrant_virtualbox builds
When the virtualbox-iso packer build fails or hangs (e.g. boot_command
keystrokes missing the GRUB menu on a slow host), the only way to see
what the VM was doing was to attach to its console interactively -
which needs VRDP and therefore the Oracle Extension Pack, whose PUEL
license does not cover organizational CI use.
Add two steps to shared-steps after the packer build, both gated on
`failure() && inputs.type == 'vagrant_virtualbox'`:
- Capture VM screen on failure: iterate `VBoxManage list runningvms`
and save a `controlvm <vm> screenshotpng` PNG per running VM to
/tmp/vm-screenshots/. `screenshotpng` is base VirtualBox - no
Extension Pack required - and so is licensing-clean for CI.
Skips quietly if VBoxManage is absent.
- Store VM screenshots as artifact: upload the PNGs as
`vm-screenshots-<type>-<variant>-<arch>` (unique per matrix leg),
with if-no-files-found: ignore so failures with no live VM add no
noise.
A failed run now leaves a workflow artifact showing the exact screen
the VM was stuck on - the same information an interactive RDP session
would have shown, without any interactive access to the runner.
BUILD_VAGRANT.md gains a "Debugging: watching the VirtualBox VM
console" section covering both paths: the new on-failure screenshot
artifact (base VirtualBox, licensing-clean) and interactive VRDP
(requires the Oracle Extension Pack - with the PUEL licensing caveat,
the version-matched install recipe, the VM-start timing gotcha, and
the SSH-tunnel + RDP recipe). Troubleshooting item 3 (VirtualBox
"Waiting for SSH" hang) now points at the artifact and the new
section, and names the GRUB boot_command timing race as the common
root cause.
When the `virtualbox-iso` Packer build hangs (most commonly at `Waiting for SSH to become available...`), seeing the VM's screen usually identifies the cause immediately — e.g. the VM parked at the GRUB menu because the `boot_command` keystrokes were typed before the menu was up.
259
+
260
+
### Automatic: screenshot artifact on failure
261
+
262
+
On every **failed**`vagrant_virtualbox` build, the shared composite action captures a PNG of each still-running VM (`VBoxManage controlvm <vm> screenshotpng`) and uploads it as the `vm-screenshots-vagrant_virtualbox-<variant>-<arch>` workflow artifact. This needs nothing beyond base VirtualBox and works on any runner. Check the artifact first before reaching for interactive access.
263
+
264
+
A one-off screenshot can also be taken manually on the build host while the VM is running:
That endpoint only works if the **Oracle VirtualBox Extension Pack** is installed on the build host — base VirtualBox has no RDP server, and with the pack missing the VM starts normally but nothing ever listens on the advertised port (the only symptom is a silent `Connection refused`).
281
+
282
+
>**Licensing.** The Extension Pack is under Oracle's PUEL license: free for **personal and educational use only**. Organizational / CI use requires a commercial Oracle license — which is why the workflows do **not** install it, and why it must stay out of org-level automation. Installing it manually on your own debug runner for occasional personal use is a different situation; make your own call.
283
+
284
+
Install (version must match the VirtualBox version exactly):
285
+
286
+
```bash
287
+
VER=$(VBoxManage --version | sed 's/r.*//') # e.g. 7.1.18
echo y | sudo VBoxManage extpack install --replace "${PACK}"
291
+
sudo VBoxManage list extpacks # must show 'Usable: true'
292
+
```
293
+
294
+
Notes:
295
+
296
+
- VRDE loads at **VM start** — a VM already running when the pack was installed will not start listening; only the next build picks it up.
297
+
- The VRDP server binds to the build host's loopback (`127.0.0.1:59xx`, port assigned per build). From a workstation, tunnel it over SSH and connect with any RDP client (no credentials — auth type is `null`):
298
+
299
+
```bash
300
+
ssh -N -L 5932:127.0.0.1:5932 <user>@<build-host>
301
+
# then RDP to localhost:5932
302
+
```
303
+
304
+
- Find the current port on the host with `sudo VBoxManage showvminfo "<vm-name>"| grep VRDE`.
305
+
256
306
## S3 upload layout
257
307
258
308
Mirrors the cloud-image workflows:
@@ -298,7 +348,7 @@ All uploaded objects are tagged `public=yes`. Downstream publishing to Vagrant C
298
348
299
349
1. **Vagrant job sits in`Queued` forever** — the label on the manual runner doesn't match `matrix_sh` exactly. Double-check the `--labels` argument to `./config.sh` and that `--no-default-labels` was passed.
300
350
2. **`vagrant up` fails with SSH timeout (libvirt/VirtualBox/VMware)** — known flake; re-run the job.
301
-
3. The `packer build`for**VirtualBox** sits on **`Waiting for SSH to become available...`**, and then failsin an hour because of **`Timeout waiting for SSH.`**. That's a known issue. Re-run the job.
351
+
3. The `packer build` for **VirtualBox** sits on **`Waiting for SSH to become available...`**, and then fails in an hour because of **`Timeout waiting for SSH.`**. That's a known issue. Re-run the job. To see what the VM was actually doing, check the `vm-screenshots-…` artifact of the failed run, or watch the console live — see [Debugging: watching the VirtualBox VM console](#debugging-watching-the-virtualbox-vm-console). A common root cause is the `boot_command` keystrokes landing before the GRUB menu is up on a slow host (fixed by the larger `boot_wait` default; can be raised further with `-var boot_wait=45s`).
302
352
4. **VirtualBox or VMware build fails with "VT-x is disabled"** — KVM is still loaded. The workflow tries to unload `kvm_intel` / `kvm_amd` automatically;if another process is holding the modules open the unload is a no-op. Stop any nested VMs on the host and re-run.
303
353
5. **VMware Install step fails with `No such file or directory` on the bundle tarball** — the VMware Workstation bundle isn't staged at `/actions-runner/_work/cloud-images/VMware-Workstation-Full-<ws_version>-<ws_build>.x86_64.bundle.tar`. Download it once (matching the `ws_version` / `ws_build` values at the top of the `Install VMware` step in [`.github/actions/shared-steps/action.yml`](.github/actions/shared-steps/action.yml)) and place it at that path on the runner.
304
354
6. **`hyperv-build.yml` can't find the Packer template** — double-check the `version_major` input; Hyper-V builds only the variants listed in the table above.
0 commit comments