-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New feature: Reconcile commands (CopyCommand, MigrateCommand, MigrateVolumeCommand) #10514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #10514 +/- ##
============================================
- Coverage 16.30% 16.28% -0.03%
- Complexity 13449 13483 +34
============================================
Files 5676 5695 +19
Lines 499208 501081 +1873
Branches 60374 60657 +283
============================================
+ Hits 81414 81612 +198
- Misses 408722 410376 +1654
- Partials 9072 9093 +21
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
5dcdcb8
to
f9bba6e
Compare
@blueorangutan package |
@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12678 |
@blueorangutan test matrix |
@weizhouapache a [SL] Trillian-Jenkins matrix job (EL8 mgmt + EL8 KVM, Ubuntu22 mgmt + Ubuntu22 KVM, EL8 mgmt + VMware 7.0u3, EL9 mgmt + XCP-ng 8.2 ) has been kicked to run smoke tests |
[SF] Trillian Build Failed (tid-12593) |
[SF] Trillian Build Failed (tid-12594) |
[SF] Trillian test result (tid-12592)
|
[SF] Trillian test result (tid-12595)
|
@blueorangutan package |
@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12766 |
@blueorangutan test matrix |
- add column resource_id and resource_type - set resource_id and resource_type based on the srcData - skip Migrating VMs with reconcile commands - clean up volumes attached to Migrating vms without reconcile commands - skip Migrating volumes with reconcile commands - skip Migrating volumes attached to vms with reconcile commands - cleanup volumes in Migrating/Creating with last_id - add vmName to ReconcileMigrateAnswer
@blueorangutan package |
@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el8 ✖️ el9 ✔️ debian ✖️ suse15. SL-JID 12892 |
@blueorangutan package |
@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el8 ✖️ el9 ✔️ debian ✖️ suse15. SL-JID 12894 |
@blueorangutan package |
@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12895 |
@blueorangutan package |
@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12914 |
[SF] Trillian test result (tid-12844)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM based on manual testing. I tested the following scenarios:
Migrate volume from one storage to another during connection failures
Migrate volume from one storage to another while agent crashes
Migrate volume from one storage to another while agent restarts
Migrate volume from one storage to another while agent times out
Migrate volume from one storage to another while management server restarts
Migrate VM to another host during connection failures
Migrate VM to another host while agent crashes
Migrate VM to another host while agent restarts
Migrate VM to another host while agent times out
Migrate VM to another host while management server restarts
Migrate VM with volumes during connection failures
Migrate VM with volumes while agent crashes
Migrate VM with volumes while agent restarts
Migrate VM with volumes while agent times out
Migrate VM with volumes while management server restarts
with NFS, Ceph and local storages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code LGTM, very good detailing in all the related docs @weizhouapache thanks.
@blueorangutan package |
@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12993 |
@blueorangutan test |
@weizhouapache a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
[SF] Trillian test result (tid-12928)
|
Description
This PR aims to improve the process on some agent commands and answers.
Current process
Many cloudstack operations require the communication between management server and cloudstack agent.
The normal process is
management server --> send commands to agents --> agents process the commands ->
agents send the answers to management server --> management server process the answers
Each operation might have one or more processes above.
Issues in some scenarios
Normally the process works fine. However, there are some issues in some scenarios
Consider the following examples
Operations to address
This FR focuses on the following operations
The backend processes can be found at
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=337678693#AsyncAgentCommandReconciliation-4.1BackendcommandsofVMandvolumemigrations
Main changes
Design doc: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Async+Agent+Command+Reconciliation
Global settings
New terminology: Reconcile commands
How it works
For reconcile commands, during stop/start of mgmt server and agent
Improvement on management server when wait for the answer of reconcile commands
Improvement on VM migration w/wo volumes
Fixes after Volume migration
Improvement on Agent
Test results
It has been tested by dev on NFS and Powerflex
Refer to https://cwiki.apache.org/confluence/display/CLOUDSTACK/Async+Agent+Command+Reconciliation#AsyncAgentCommandReconciliation-4.3Summaryoftestresults
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?