Skip to content

CI flake: migrating out of a VM doesn't always ensure its VMM resources are cleaned up #648

Open
@gjcolombo

Description

@gjcolombo

Example victim run (there have been others): https://github.com/oxidecomputer/propolis/pull/646/checks?check_run_id=21668964010

All the tests in this run passed, but the migration_smoke test's source VM's VMM handle was leaked, causing the overall run to fail.

From the test logs we can see that Propolis returned a 404 when trying to stop this VM. This is expected from a migration source, since the VM controller gets torn down once a migration succeeds:

phd-runner: [VM CLEANUP - EVENT] error stopping VM to move it to Destroyed
    error = Error Response: status: 404 Not Found; headers: {"content-type": "application/json", "x-request-id": "9ed33d45-46a7-4994-9d91-254dea28b142", "content-length": "84", "date": "Fri, 16 Feb 2024 20:24:02 GMT"}; value: Error { error_code: None, message: "Not Found", request_id: "9ed33d45-46a7-4994-9d91-254dea28b142" }
    file = phd-tests/framework/src/test_vm/mod.rs
    line = 917
    path = phd_tests::migrate::smoke_test
    target = phd_framework::test_vm
    vm = migration_smoke
    vm_id = c51ddb74-4366-4c94-8aa4-740ec0bc72b3

This is not a production-impacting problem because in a real control plane, Propolis runs in a zone, and migrating out of a Propolis will (or should) direct the control plane to destroy the zone, which will clean up all remaining VMM resources. Still, this is an annoying flake and we should fix it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething that isn't working.testingRelated to testing and/or the PHD test framework.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions