-
Notifications
You must be signed in to change notification settings - Fork 292
CP-307922: Implement outbound SXM for SMAPIv3 #6457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
CP-307922: Implement outbound SXM for SMAPIv3 #6457
Conversation
73f4008
to
590d4a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor things.
590d4a5
to
cc7cb6a
Compare
7a9c44f
to
d8d54be
Compare
let proxy_srv = Fecomms.open_unix_domain_sock_server path in | ||
try | ||
let uri = | ||
Printf.sprintf "/services/SM/nbdproxy/import/%s/%s/%s/%s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use the URI module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could, but it is a bit awkward with the Http
module which itself is a wrapper around the Uri module Uri
, so I think it's fine to leave the Uri
handling to the Http.Url
module
d8d54be
to
34f9785
Compare
() | ||
|> Uri.to_string | ||
in | ||
let _ : Thread.t = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means the current thread has little control over the execution of the split thread. How is it ensured that the thread at the very least times out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, do you have any suggestions... Use some shared variable for communcations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure either, we expect the stat to fail at some point while doing the waiting, a timeout could be added to the wait function if there's no progress made over a long period of time, but we've known migration that take 20 hours, so adding a timeout might interrupt long migrations.
The wait loop should also have sort of sleep before calling the next iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reckon we should leave this to the Data.stat
call since it is responsible for detecting anything wrong with the migration. If things go wrong, like the thread exiting abnormally, then Data.stat
should/will return a value saying that the mirror has failed because the mirror has indeed failed, which will result in a Migration_mirror_failure
being raised
Signed-off-by: Vincent Liu <[email protected]>
Signed-off-by: Vincent Liu <[email protected]>
These two functions are the new SMAPIv3 functions that will enable mirror and query of the mirror status. So implement them in xapi-storage-script. The SMAPIv1 counterparts remain unimplemented. Signed-off-by: Vincent Liu <[email protected]>
The similar VDI functionality is uncurrently unused for SMAPIv3 migration so just add a dummy implementation. Signed-off-by: Vincent Liu <[email protected]>
This is the main commit that implements the MIRROR interface in storage_smapiv3_migrate. The exact detail of how SMAPIv3 mirror is done is left in the SXM documentation, but core of it is to provide all the necessary infrastructure to able to call the `Data.mirror` SMAPIv3 call that will mirror a VDI to another. Signed-off-by: Vincent Liu <[email protected]>
dummy_vdi and parent_vdi are not created by storage_smapiv3_migrate.receive_start2, so do not attempt to destroy them in storage_smapiv3_migrate.receive_cancel2. Signed-off-by: Vincent Liu <[email protected]>
This is to mimic the behaviour on SMAPIv1. The update_snapshot_info function that runs at the end of migration will check for content_id, and this is needed to make it happy. Signed-off-by: Vincent Liu <[email protected]>
Whilst it is not the default behaviour on XS 8 to attach a VDI through NBD, SXM inbound into a SMAPIv1 SR needs to have nbd enabled for mirroring purposes. As tapdisk will return usable nbd parameters to xapi, they can be included in the return value of attach. Most current users of this return value will keep using blktap2 kernel device and this nbd information is only used during SXM. Signed-off-by: Vincent Liu <[email protected]>
As this nbd proxy is used for importing data, call it `import_nbd_proxy` to distinguish with the `export_nbd_proxy` that will be introduced later on. Signed-off-by: Vincent Liu <[email protected]>
This is a bit of a layering violation as storage_mux should not care about the version of SMAPI the SR is, nor should it be responsible for calling hook functions. But as there is no way for xapi-storage-script to invoke code in xapi (which would also be a layering violation if it was possible), and smapiv1_wrapper has special state tracking logic for determining whether the hook should be called. Leave the hook here for now. Note the pre_deactivate_hook is not called as currently that remains a noop for SMAPIv3. And as we do not support VM shutdown during outbound SXM for SMAPIv3 anyway, leave a hack in the storage_mux for now until we have a plan on how to support that. Signed-off-by: Vincent Liu <[email protected]>
The attach and activate of the VDI being live migrated is there so that the SXM can keep working even if the VM on which the VDI is activated shutsdown. This is possible on SMAPIv1 as tapdisk does not distinguish between different domain paramters. But that is not the case for SMAPIv3. For now just avoid activating the VDI on dom0 since the VM is already activated on the live_vm. This does mean that SXM will stop working if the VM is shut down during storage migration. We will leave that case in the future. Signed-off-by: Vincent Liu <[email protected]>
34f9785
to
3242fbc
Compare
There is a mirror_checker/tapdisk_watchdog for SMAPIv1 that periodically checks the status of the mirror and sends an update if it detects a failure. Implement something similar for SMAPIv3 mirror, although this check happens for a shorter period of time compared to the SMAPIv1 tapdisk_watchdog because the `Data.stat` call will stop working once the VM is paused, and currently we have no easy way to terminate this mirror checker just before the VM is paused (in xenopsd). So only do this check whilst the mirror syncing is in progress, i.e. when we are copying over the existing disk content. Signed-off-by: Vincent Liu <[email protected]>
Previously the tapdisk watchdog in SMAPIv1 mirroring was cancelled in the `post_deactivate_hook`, but at that point the VDI has already been deactivated, and hence the mirror would have been terminated. Additionally, the last time the stats is retrieved is in `pre_deactivate_hook`, so do this cancelling after the last stats retrival. Note that SMAPIv3 mirror does not have a watchdog due to the limitations of the mirror job auto cancel after guest pause, so instead the mirror checking is only done whilst the mirror syncing (i.e. copying existing disk content) is in progress. Signed-off-by: Vincent Liu <[email protected]>
Signed-off-by: Vincent Liu <[email protected]>
Signed-off-by: Vincent Liu <[email protected]>
Signed-off-by: Vincent Liu <[email protected]>
This is a continuation of #6439 in the effort of implementing outbound SXM for SMAPIv3. We have reached the climatic point of this and can actually now implement the logic to do outbound SXM for SMAPIv3 SRs!
This is a rather large PR and I expect it to take some time to be reviewed and merged, so I am opening this early to gather some feedback. Since #6439 is not yet merged and this PR depends on that one, I am marking this one as a draft, while reviewing please ignore the first three commit in this PR and look at #6439 first instead. I will update this one again when #6439 is merged
There is also a couple of docs PR at the end documenting the design and approach taken in doing SMAPIv3 migration.
In terms of testing plan, the important thing is to make sure for now that this is not regressing the SMAPIv1 migration. For that I will be using the SXM functional tests suite. I will also be using more tests to actually test the SMAPIv3 SXM feature.