Skip to content

Support to enable/disable VM High Availability manager #10118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

sureshanaparti
Copy link
Contributor

@sureshanaparti sureshanaparti commented Dec 17, 2024

Description

This PR adds support to enable/disable VM High Availability manager.

  • New config 'vm.ha.enabled' with Zone scope is added, to enable/disable VM High Availability manager. This is enable by default (for backward compatibilty). When enabled, the VM HA WorkItems (for VM Stop, Restart, Migration, Destroy) can be created and the scheduled items are executed. When disabled, new VM HA WorkItems are not allowed and the scheduled items are retried until max retries configured at 'vm.ha.migration.max.retries' (executed in case HA is re-enabled during retry attempts), and then purged after 'time.between.failures' by the cleanup thread that runs regularly at 'time.between.cleanup'.
  • New config 'vm.ha.alerts.enabled' with Zone scope is added, to enable/disable alerts for the VM HA operations. This is enabled by default.

Both these config settings can defined at zone/global level.

Doc PR: apache/cloudstack-documentation#464

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

New Settings =>

VmHaGlobalSettings

Sample Alert when 'vm.ha.enabled' is false =>

VmHaAlert

How Has This Been Tested?

Manually tested the VM HA related operations on host maintenance and during host down/alert, enabling & disabling new config 'vm.ha.enabled'.

…erts

- Adds new config 'vm.ha.enabled'  with Zone scope, to enable/disable VM High Availability manager. This is enable by default (for backward compatibilty).
  When enabled, the VM HA WorkItems (for VM Stop, Restart, Migration, Destroy) can be created and the scheduled items are executed.
  When disabled, new VM HA WorkItems are not allowed and the scheduled items are retried until max retries configured at 'vm.ha.migration.max.retries' (executed in case HA is re-enabled during retry attempts), and then purged after 'time.between.failures' by the cleanup thread that runs regularly at 'time.between.cleanup'.
- Adds new config 'vm.ha.alerts.enabled' with Zone scope, to enable/disable alerts for the VM HA operations. This is enabled by default.
@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

Copy link

codecov bot commented Dec 17, 2024

Codecov Report

Attention: Patch coverage is 53.08642% with 38 lines in your changes missing coverage. Please review.

Project coverage is 16.04%. Comparing base (ac19379) to head (819f72a).
Report is 11 commits behind head on 4.20.

Files with missing lines Patch % Lines
...java/com/cloud/ha/HighAvailabilityManagerImpl.java 55.26% 21 Missing and 13 partials ⚠️
...n/java/com/cloud/resource/ResourceManagerImpl.java 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               4.20   #10118   +/-   ##
=========================================
  Coverage     16.03%   16.04%           
- Complexity    12814    12829   +15     
=========================================
  Files          5637     5637           
  Lines        493507   493574   +67     
  Branches      59831    59848   +17     
=========================================
+ Hits          79131    79188   +57     
+ Misses       405600   405592    -8     
- Partials       8776     8794   +18     
Flag Coverage Δ
uitests 4.02% <ø> (ø)
unittests 16.88% <53.08%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

2 similar comments
@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11839

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11838

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@apache apache deleted a comment from blueorangutan Dec 19, 2024
@apache apache deleted a comment from blueorangutan Dec 19, 2024
@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11853

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test

@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-11956)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 50877 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10118-t11956-kvm-ol8.zip
Smoke tests completed. 141 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

Copy link
Contributor

@kiranchavala kiranchavala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM , tested manually and the feature works fine

Test Case Execution Result
Verify that high availability tasks for VMs work as usual when 'vm.ha.enabled' is enabled Pass
Verify no alerts are sent to operators when scheduling of VM HA operations fails when 'vm.ha.alerts.enabled' is disabled Pass
Exception message should be shown when vm.ha.enabled is disabled and host is kept in maintainence mode Pass
The entires in op_ha_work should get purged by cleanup threads Pass
Verify that certain tasks are not allowed when 'vm.ha.enabled' is disabled Pass
Verify alerts are sent to operators when scheduling of VM HA operations fails when 'vm.ha.alerts.enabled' is enabled Pass

@sureshanaparti sureshanaparti marked this pull request as ready for review December 26, 2024 08:14
@rohityadavcloud rohityadavcloud merged commit 330ed25 into apache:4.20 Dec 26, 2024
25 checks passed
@rohityadavcloud rohityadavcloud deleted the cs-vm-high-availablity-on-off branch December 26, 2024 12:15
DaanHoogland added a commit that referenced this pull request Dec 30, 2024
* 4.20:
  VR: fix site-2-site VPN if split connections is enabled (#10067)
  UI: fix cannot open 'Edit tags' modal for static routes (#10065)
  Update ownership selection component to be language independent (#10052)
  Support to enable/disable VM High Availability manager and related alerts (#10118)
dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Jan 10, 2025
…erts (apache#10118)

- Adds new config 'vm.ha.enabled'  with Zone scope, to enable/disable VM High Availability manager. This is enable by default (for backward compatibilty).
  When enabled, the VM HA WorkItems (for VM Stop, Restart, Migration, Destroy) can be created and the scheduled items are executed.
  When disabled, new VM HA WorkItems are not allowed and the scheduled items are retried until max retries configured at 'vm.ha.migration.max.retries' (executed in case HA is re-enabled during retry attempts), and then purged after 'time.between.failures' by the cleanup thread that runs regularly at 'time.between.cleanup'.
- Adds new config 'vm.ha.alerts.enabled' with Zone scope, to enable/disable alerts for the VM HA operations. This is enabled by default.
@Pearl1594 Pearl1594 moved this to Done in ACS 4.20.1 Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants