Description
In order to properly test a bunch of the live migration work we are doing, we will want to run live migrations as a part of our automated testing.
At this time I'm not sure what work is required here, but I think we are approaching the point where having this functionality would help to build confidence in the several ongoing pieces of implementation in progress, such as the timing data work I'm doing, and the upstack control plane work @gjcolombo is landing soon. There's also a fair amount of medium-term work that this would support, such as #324.
I suspect we will need to add support to PHD for the inter-machine migration case and some changes to buildomat (I don't yet have enough context to understand the lift required there). I imagine we might also want to add support for different types of guest workloads and images, though that might be further down the line.
A couple of things I personally would like to do in such testing include making use of the instrumentation we've added or plan to add for observing performance, including:
- Add coarse instrumentation for measuring migration protocol times #347 for measuring performance of the migration protocol overall
- want mechanism for tracking guest vCPU pause time #349 for tracking guest pause time
- implement live migration support for guest timing data #337 includes a D script for observing adjustments made to timing data
We could use this existing instrumentation to track performance regressions, which will become important as we need to start optimizing migration times.