Skip to content

Crucible backend needs a mechanism to be reconfigured before restarting after failed LM #230

Open
@gjcolombo

Description

@gjcolombo

Hypothesized repro steps:

  1. Launch a VM that connects to a set of Crucible downstairs with Crucible generation 1.
  2. Start a migration target that will connect to the same downstairs with Crucible generation 2.
  3. Inject a failure into migration immediately after the target activates (note that this can happen even after Migration: creating a migration-target Crucible upstairs prevents (or may prevent) the source from pausing #155 is fixed if there is any way for the target to fail to start after Crucible activates).

Expected: The source will wait to be moved back to the 'Running' state. Before that happens, the control plane will set the source's Crucible generation to 3 and direct it to reactivate.

Current state: There are two problems to solve here:

  • Nexus has no way to update the Crucible generation number of an ensured instance.
  • The migration state machine automatically resumes the source after migration fails; it should wait until it's told to do so (and should allow configuration changes to be interposed between migration failure and source restart).

Metadata

Metadata

Assignees

No one assigned

    Labels

    migrationIssues related to live migration.storageRelated to storage devices/backends.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions