Skip to content

XCP-NG / XenServer live migration of unattached volumes #6833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dpassante
Copy link
Contributor

Live migrate unattached volumes without going through secondary storage.

Description

This PR includes:

  • A XenAPI Plugin for live migrating unattached disks across storage pools
  • An optional change on how detached volumes are migrated across xcp-ng/xenserver storage pools

When attaching a volume to a virtual machine belonging to a remote storage pool or when moving volumes from one storage pool to another, the VDI is first copied to the secondary storage and then from secondary storage to the destination storage pool, which can be very long when migrating large volumes.

With this PR, volumes can be live migrated (StorageXenMotion) by attaching them to a temporary transport VM.
A small transport vm (without OS) is created and the volume is attached to it. Thus, the volume can be moved by live migrating the transport vm to the destination cluster.

A new global settingxen.live.migrate.unattached.volumes has been added to indicate whether to activate the plugin or not.

The module itself can be used standalone as below:

xe host-call-plugin host-uuid=931581a4-a73f-4842-a29c-ddd3d33344c3 plugin=migrate-unattached-disk fn=migrate_vdi args:local_vdi_uuid=0cfe7d80-1486-419d-8142-51e0e57eba9a args:remote_host=host01.mydomain.net args:remote_username=root args:remote_password=s3cur3d args:remote_sr_uuid=1c5fb76
4-6e70-e6f1-0be4-024a352e3a57 args:network_uuid=0dd16d73-df9d-0452-15b3-80be6f150042 args:dest_host_uuid=693ca0ee-517b-4710-9a7e-ce70df3a11d7 args:protocol='https'

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

How Has This Been Tested?

The feature has been running for several months on Cloudstack 4.13.1/4.16.1 + XenServer 7.1 in production.
The module, which relied on the xenserver-transfer-vm package which was removed from xcp 8.2.1, has just been rewritten to work on xcp/xenserver 8.2.1. It was manually tested on Cloudstack 4.16.1 / xcp-ng 8.2.1.
I currently don't have an environment available to test with the main branch.

cc @ArthurHlt

Live migrate unattached volumes without going through secondary storage.

Co-authored-by: Arthur Halet <[email protected]>
@codecov
Copy link

codecov bot commented Oct 18, 2022

Codecov Report

Merging #6833 (be0f145) into 4.18 (2ca164a) will decrease coverage by 0.02%.
Report is 434 commits behind head on 4.18.
The diff coverage is 3.27%.

@@             Coverage Diff              @@
##               4.18    #6833      +/-   ##
============================================
- Coverage     10.81%   10.80%   -0.02%     
- Complexity     7083     7084       +1     
============================================
  Files          2485     2487       +2     
  Lines        245346   245475     +129     
  Branches      38313    38325      +12     
============================================
- Hits          26525    26513      -12     
- Misses       215556   215699     +143     
+ Partials       3265     3263       -2     
Files Changed Coverage Δ
...ver610MigrateWithStorageReceiveCommandWrapper.java 78.12% <0.00%> (-5.21%) ⬇️
...storage/motion/XenServerStorageMotionStrategy.java 0.00% <0.00%> (ø)
...itrixPrepareForMigrationStorageCommandWrapper.java 4.44% <4.44%> (ø)
.../CitrixCleanForMigrationStorageCommandWrapper.java 15.38% <15.38%> (ø)
...Server610MigrateWithStorageSendCommandWrapper.java 56.25% <50.00%> (-1.82%) ⬇️

... and 4 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug D 1 Bug
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 17 Code Smells

3.3% 3.3% Coverage
0.0% 0.0% Duplication


package com.cloud.agent.api;

public class CleanForMigrationStorageCommandAnswer extends Answer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public class CleanForMigrationStorageCommandAnswer extends Answer {
public class CleanForMigrationStorageAnswer extends Answer {

@DaanHoogland
Copy link
Contributor

@dpassante would you see an opportunity to add a marvin/integration test?
@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✖️ el7 ✖️ el8 ✖️ debian ✖️ suse15. SL-JID 4499

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 4505

@DaanHoogland
Copy link
Contributor

@blueorangutan test centos7 xenserver74

@blueorangutan
Copy link

@DaanHoogland unsupported parameters provided. Supported mgmt server os are: centos7, centos6, suse15, alma8, ubuntu18, ubuntu22, ubuntu20, rocky8. Supported hypervisors are: kvm-centos6, kvm-centos7, kvm-rocky8, kvm-alma8, kvm-ubuntu18, kvm-ubuntu20, kvm-ubuntu22, kvm-suse15, vmware-55u3, vmware-60u2, vmware-65u2, vmware-67u3, vmware-70u1, vmware-70u2, vmware-70u3, xenserver-65sp1, xenserver-71, xenserver-74, xcpng74, xcpng76, xcpng80, xcpng81, xcpng82

@DaanHoogland
Copy link
Contributor

@blueorangutan test centos7 xenserver-74

@blueorangutan
Copy link

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + xenserver-74) has been kicked to run smoke tests

@dpassante
Copy link
Contributor Author

@dpassante would you see an opportunity to add a marvin/integration test? @blueorangutan package

@DaanHoogland We can try but we would need a lead on how to do it. Do you have an example of integration testing on something similar?

@blueorangutan
Copy link

@dpassante a Jenkins job has been kicked to build packages. It will be bundled with

@DaanHoogland We can try but we would need a lead on how to do it. Do you have an example of integration testing on something similar? SystemVM template(s). I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 4508

@DaanHoogland
Copy link
Contributor

@dpassante would you see an opportunity to add a marvin/integration test?

@DaanHoogland We can try but we would need a lead on how to do it. Do you have an example of integration testing on something similar?

any examples can be found in test/integration/smoke or in test/integration/component. grep for xen to find more specific examples. any will give at least some clues.

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-5177)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 42629 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr6833-t5177-kvm-centos7.zip
Smoke tests completed. 104 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@blueorangutan
Copy link

Trillian test result (tid-5175)
Environment: xenserver-74 (x2), Advanced Networking with Mgmt server 7
Total time taken: 52969 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr6833-t5175-xenserver-74.zip
Smoke tests completed. 60 look OK, 14 have errors, 7 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_deploy_vm_start_failure Error 172.09 test_deploy_vm.py
test_deploy_vm_volume_creation_failure Error 122.10 test_deploy_vm.py
test_vm_ha Error 99.22 test_vm_ha.py
test_vm_sync Error 194.54 test_vm_sync.py
test_07_project_resources_account_delete Error 5.53 test_projects.py
test_08_cleanup_after_project_delete Error 5.91 test_projects.py
ContextSuite context=TestProjectResources>:teardown Error 8.05 test_projects.py
ContextSuite context=TestProjectSuspendActivate>:teardown Error 123.40 test_projects.py
ContextSuite context=TestDeployVmWithAffinityGroup>:teardown Error 160.77 test_affinity_groups_projects.py
test_UpdateConfigParamWithScope Error 0.07 test_global_settings.py
test_create_pvlan_network Error 0.03 test_pvlan.py
test_03_user_role_dont_see_annotations Failure 1.76 test_host_annotations.py
test_03_RVR_Network_check_router_state Failure 183.58 test_routers_network_ops.py
test_02_vpc_privategw_static_routes Failure 361.48 test_privategw_acl.py
test_03_vpc_privategw_restart_vpc_cleanup Failure 368.66 test_privategw_acl.py
test_04_rvpc_privategw_static_routes Failure 616.98 test_privategw_acl.py
test_01_scale_vm Error 0.03 test_scale_vm.py
test_07_resize_fail Failure 36.03 test_volumes.py
test_11_migrate_volume_and_change_offering Error 8.41 test_volumes.py
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL Failure 486.77 test_vpc_redundant.py
test_02_redundant_VPC_default_routes Failure 485.23 test_vpc_redundant.py
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers Failure 353.73 test_vpc_redundant.py
test_04_rvpc_network_garbage_collector_nics Failure 309.12 test_vpc_redundant.py
test_05_rvpc_multi_tiers Failure 443.35 test_vpc_redundant.py
test_05_rvpc_multi_tiers Error 491.04 test_vpc_redundant.py
test_02_cancel_host_maintenace_with_migration_jobs Failure 237.93 test_host_maintenance.py
all_test_vm_deployment_planner Skipped --- test_vm_deployment_planner.py
all_test_multipleips_per_nic Skipped --- test_multipleips_per_nic.py
all_test_router_dns Skipped --- test_router_dns.py
all_test_nested_virtualization Skipped --- test_nested_virtualization.py
all_test_network Skipped --- test_network.py
all_test_network_acl Skipped --- test_network_acl.py
all_test_nic Skipped --- test_nic.py

@DaanHoogland
Copy link
Contributor

package build and smoke tests overlapped. retrying
@blueorangutan test centos7 xenserver-74

@blueorangutan
Copy link

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + xenserver-74) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-5180)
Environment: xenserver-74 (x2), Advanced Networking with Mgmt server 7
Total time taken: 44712 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr6833-t5180-xenserver-74.zip
Smoke tests completed. 103 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_13_migrate_volume_and_change_offering Error 8.51 test_volumes.py

@DaanHoogland
Copy link
Contributor

@dpassante can you look at the failed test?

test_13_migrate_volume_and_change_offering Error 8.51 test_volumes.py

@dpassante
Copy link
Contributor Author

@dpassante can you look at the failed test?

test_13_migrate_volume_and_change_offering Error 8.51 test_volumes.py

@DaanHoogland I could be wrong but I don't feel like the [LICENCE_RESTRICTION, Storage_motion] error is related to my PR.
Isn't SXM support required to pass this test?

@DaanHoogland
Copy link
Contributor

test_13_migrate_volume_and_change_offering Error 8.51 test_volumes.py

@DaanHoogland I could be wrong but I don't feel like the [LICENCE_RESTRICTION, Storage_motion] error is related to my PR. Isn't SXM support required to pass this test?

You are right, I thought this one was failing twce, but I was not looking ..., just to make sure
@blueorangutan test centos7 xenserver-74

@blueorangutan
Copy link

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + xenserver-74) has been kicked to run smoke tests

Copy link
Contributor

@JoaoJandre JoaoJandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the leftover comment, LGTM


MigrateWithStorageReceiveCommand receiveCmd = new MigrateWithStorageReceiveCommand(to, volumeToStorageUuid);
MigrateWithStorageReceiveAnswer receiveAnswer = (MigrateWithStorageReceiveAnswer)agentMgr.send(destHost.getId(), receiveCmd);
// s_logger.error("Migration with storage of vm " + vm + " to host " + destHost + " failed.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// s_logger.error("Migration with storage of vm " + vm + " to host " + destHost + " failed.");

@github-actions
Copy link

github-actions bot commented Aug 2, 2023

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@DaanHoogland DaanHoogland changed the base branch from main to 4.18 August 3, 2023 12:31
@DaanHoogland
Copy link
Contributor

@dpassante , this could still go into 4.18.1 . Do you want to? cc @weizhouapache

@rohityadavcloud
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@rohityadavcloud a [SF] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 6708

@DaanHoogland
Copy link
Contributor

@blueorangutan test rocky8 xenserver-74

@blueorangutan
Copy link

@DaanHoogland a [SF] Trillian-Jenkins test job (rocky8 mgmt + xenserver-74) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian Build Failed (tid-7354)

@blueorangutan
Copy link

[SF] Trillian Build Failed (tid-7362)


return new CleanForMigrationStorageCommandAnswer(command);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpassante , the lint check fails because of this; no eol before eof

@@ -0,0 +1,295 @@
#!/usr/bin/env python

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license missing

@blueorangutan
Copy link

[SF] Trillian Build Failed (tid-7364)

@shwstppr
Copy link
Contributor

shwstppr commented Oct 3, 2023

@dpassante can you please address the outstanding comments and build failures?

@DaanHoogland
Copy link
Contributor

@dpassante are you still interested in this for 4.19? I'll move it to unplanned otherwise.

@JoaoJandre
Copy link
Contributor

@dpassante could you please address the outstanding comments and build failures?

@JoaoJandre
Copy link
Contributor

As this hasn't been touched on by the author since October 2022, I'll be moving the milestone to unplanned.
cc @DaanHoogland .

@JoaoJandre JoaoJandre modified the milestones: 4.18.2.0, unplanned Jan 30, 2024
@DaanHoogland DaanHoogland changed the base branch from 4.18 to main April 19, 2024 09:15
Copy link

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants