Description
Example victim run (there have been others): https://github.com/oxidecomputer/propolis/pull/646/checks?check_run_id=21668964010
All the tests in this run passed, but the migration_smoke
test's source VM's VMM handle was leaked, causing the overall run to fail.
From the test logs we can see that Propolis returned a 404 when trying to stop this VM. This is expected from a migration source, since the VM controller gets torn down once a migration succeeds:
phd-runner: [VM CLEANUP - EVENT] error stopping VM to move it to Destroyed
error = Error Response: status: 404 Not Found; headers: {"content-type": "application/json", "x-request-id": "9ed33d45-46a7-4994-9d91-254dea28b142", "content-length": "84", "date": "Fri, 16 Feb 2024 20:24:02 GMT"}; value: Error { error_code: None, message: "Not Found", request_id: "9ed33d45-46a7-4994-9d91-254dea28b142" }
file = phd-tests/framework/src/test_vm/mod.rs
line = 917
path = phd_tests::migrate::smoke_test
target = phd_framework::test_vm
vm = migration_smoke
vm_id = c51ddb74-4366-4c94-8aa4-740ec0bc72b3
This is not a production-impacting problem because in a real control plane, Propolis runs in a zone, and migrating out of a Propolis will (or should) direct the control plane to destroy the zone, which will clean up all remaining VMM resources. Still, this is an annoying flake and we should fix it.