Skip to content

CP-307922: Implement outbound SXM for SMAPIv3 #6457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

Vincent-lau
Copy link
Contributor

@Vincent-lau Vincent-lau commented May 7, 2025

This is a continuation of #6439 in the effort of implementing outbound SXM for SMAPIv3. We have reached the climatic point of this and can actually now implement the logic to do outbound SXM for SMAPIv3 SRs!

This is a rather large PR and I expect it to take some time to be reviewed and merged, so I am opening this early to gather some feedback. Since #6439 is not yet merged and this PR depends on that one, I am marking this one as a draft, while reviewing please ignore the first three commit in this PR and look at #6439 first instead. I will update this one again when #6439 is merged

There is also a couple of docs PR at the end documenting the design and approach taken in doing SMAPIv3 migration.

In terms of testing plan, the important thing is to make sure for now that this is not regressing the SMAPIv1 migration. For that I will be using the SXM functional tests suite. I will also be using more tests to actually test the SMAPIv3 SXM feature.

@Vincent-lau Vincent-lau marked this pull request as draft May 7, 2025 12:35
@Vincent-lau Vincent-lau force-pushed the private/shul2/sxm-mux6 branch 3 times, most recently from 73f4008 to 590d4a5 Compare May 7, 2025 13:13
Copy link
Contributor

@GabrielBuica GabrielBuica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor things.

@Vincent-lau Vincent-lau force-pushed the private/shul2/sxm-mux6 branch from 590d4a5 to cc7cb6a Compare May 7, 2025 13:17
@Vincent-lau Vincent-lau changed the title Implement outbound SXM for SMAPIv3 CP-307922: Implement outbound SXM for SMAPIv3 May 7, 2025
@Vincent-lau Vincent-lau force-pushed the private/shul2/sxm-mux6 branch 3 times, most recently from 7a9c44f to d8d54be Compare May 9, 2025 13:34
let proxy_srv = Fecomms.open_unix_domain_sock_server path in
try
let uri =
Printf.sprintf "/services/SM/nbdproxy/import/%s/%s/%s/%s"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use the URI module?

Copy link
Contributor Author

@Vincent-lau Vincent-lau May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could, but it is a bit awkward with the Http module which itself is a wrapper around the Uri module Uri, so I think it's fine to leave the Uri handling to the Http.Url module

@Vincent-lau Vincent-lau force-pushed the private/shul2/sxm-mux6 branch from d8d54be to 34f9785 Compare May 12, 2025 11:31
@Vincent-lau Vincent-lau marked this pull request as ready for review May 12, 2025 11:31
()
|> Uri.to_string
in
let _ : Thread.t =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means the current thread has little control over the execution of the split thread. How is it ensured that the thread at the very least times out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, do you have any suggestions... Use some shared variable for communcations?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure either, we expect the stat to fail at some point while doing the waiting, a timeout could be added to the wait function if there's no progress made over a long period of time, but we've known migration that take 20 hours, so adding a timeout might interrupt long migrations.

The wait loop should also have sort of sleep before calling the next iteration

Copy link
Contributor Author

@Vincent-lau Vincent-lau May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reckon we should leave this to the Data.stat call since it is responsible for detecting anything wrong with the migration. If things go wrong, like the thread exiting abnormally, then Data.stat should/will return a value saying that the mirror has failed because the mirror has indeed failed, which will result in a Migration_mirror_failure being raised

Signed-off-by: Vincent Liu <[email protected]>
These two functions are the new SMAPIv3 functions that will enable
mirror and query of the mirror status. So implement them in
xapi-storage-script. The SMAPIv1 counterparts remain unimplemented.

Signed-off-by: Vincent Liu <[email protected]>
The similar VDI functionality is uncurrently unused for SMAPIv3
migration so just add a dummy implementation.

Signed-off-by: Vincent Liu <[email protected]>
This is the main commit that implements the MIRROR interface in
storage_smapiv3_migrate. The exact detail of how SMAPIv3 mirror is done
is left in the SXM documentation, but core of it is to provide all the
necessary infrastructure to able to call the `Data.mirror` SMAPIv3 call
that will mirror a VDI to another.

Signed-off-by: Vincent Liu <[email protected]>
dummy_vdi and parent_vdi are not created by
storage_smapiv3_migrate.receive_start2, so do not attempt to destroy
them in storage_smapiv3_migrate.receive_cancel2.

Signed-off-by: Vincent Liu <[email protected]>
This is to mimic the behaviour on SMAPIv1. The update_snapshot_info
function that runs at the end of migration will check for content_id,
and this is needed to make it happy.

Signed-off-by: Vincent Liu <[email protected]>
Whilst it is not the default behaviour on XS 8 to attach a VDI through
NBD, SXM inbound into a SMAPIv1 SR needs to have nbd enabled for
mirroring purposes. As tapdisk will return usable nbd parameters to
xapi, they can be included in the return value of attach. Most current
users of this return value will keep using blktap2 kernel device and
this nbd information is only used during SXM.

Signed-off-by: Vincent Liu <[email protected]>
As this nbd proxy is used for importing data, call it `import_nbd_proxy`
to distinguish with the `export_nbd_proxy` that will be introduced later
on.

Signed-off-by: Vincent Liu <[email protected]>
This is a bit of a layering violation as storage_mux should not care
about the version of SMAPI the SR is, nor should it be responsible for
calling hook functions. But as there is no way for xapi-storage-script
to invoke code in xapi (which would also be a layering violation if it
was possible), and smapiv1_wrapper has special state tracking logic for
determining whether the hook should be called. Leave the hook here for
now.

Note the pre_deactivate_hook is not called as currently that remains a
noop for SMAPIv3. And as we do not support VM shutdown during outbound
SXM for SMAPIv3 anyway, leave a hack in the storage_mux for now until we
have a plan on how to support that.

Signed-off-by: Vincent Liu <[email protected]>
The attach and activate of the VDI being live migrated is there so that
the SXM can keep working even if the VM on which the VDI is activated
shutsdown. This is possible on SMAPIv1 as tapdisk does not distinguish
between different domain paramters. But that is not the case for
SMAPIv3.

For now just avoid activating the VDI on dom0 since the VM is already
activated on the live_vm. This does mean that SXM will stop working if
the VM is shut down during storage migration. We will leave that case in
the future.

Signed-off-by: Vincent Liu <[email protected]>
@Vincent-lau Vincent-lau force-pushed the private/shul2/sxm-mux6 branch from 34f9785 to 3242fbc Compare May 13, 2025 09:03
There is a mirror_checker/tapdisk_watchdog for SMAPIv1 that periodically
checks the status of the mirror and sends an update if it detects a
failure.

Implement something similar for SMAPIv3 mirror, although this
check happens for a shorter period of time compared to the SMAPIv1
tapdisk_watchdog because the `Data.stat` call will stop working once the
VM is paused, and currently we have no easy way to terminate this mirror
checker just before the VM is paused (in xenopsd). So only do this check
whilst the mirror syncing is in progress, i.e. when we are copying over
the existing disk content.

Signed-off-by: Vincent Liu <[email protected]>
Previously the tapdisk watchdog in SMAPIv1 mirroring was cancelled in
the `post_deactivate_hook`, but at that point the VDI has already been
deactivated, and hence the mirror would have been terminated.
Additionally, the last time the stats is retrieved is in
`pre_deactivate_hook`, so do this cancelling after the last stats
retrival.

Note that SMAPIv3 mirror does not have a watchdog due to the limitations
of the mirror job auto cancel after guest pause, so instead the mirror
checking is only done whilst the mirror syncing (i.e. copying existing
disk content) is in progress.

Signed-off-by: Vincent Liu <[email protected]>
Signed-off-by: Vincent Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants