Skip to content

Enhance timeout settings and validation for HANA cluster operations#232

Merged
devanshjainms merged 2 commits intoAzure:development-feb-2026from
devanshjainms:scale-out-backup
Apr 10, 2026
Merged

Enhance timeout settings and validation for HANA cluster operations#232
devanshjainms merged 2 commits intoAzure:development-feb-2026from
devanshjainms:scale-out-backup

Conversation

@devanshjainms
Copy link
Copy Markdown
Contributor

This pull request makes several improvements and refactorings to the HANA DB high-availability Ansible roles, focusing on more robust and streamlined network partition testing, improved error handling, and minor logic corrections. The most important changes are grouped below:

Network Partition Test Refactoring and Robustness:

  • The network partition test (secondary-block-network.yml) is refactored to block all network communication on secondary site nodes using iptables -P DROP, applied in parallel to all secondary nodes, instead of adding/removing individual firewall rules per IP. This simplifies the logic and ensures a more reliable partition. [1] [2]
  • The playbook now includes a rescue step that resets iptables policies to ACCEPT on secondary site nodes in case of test failure, ensuring the cluster is not left in a partitioned state.

Logic and Validation Improvements:

  • The test validation logic for worker node operations is updated to ensure the target worker node is actually present in the expected site after the operation, improving test accuracy.
  • A condition is added to only execute the primary node kill test when the saphanasr_provider is set to SAPHanaSR-angi, preventing unintended execution in other configurations.

Timeout and Error Handling:

  • The default timeout value for test waits is increased from 60 to 90 seconds, allowing more time for cluster state changes and reducing spurious test failures in slow environments.
  • The use of Jinja templating is removed from changed_when and failed_when expressions for better readability and reliability.

Test Reporting:

  • Test case reporting is updated to reflect the new network partitioning approach, indicating that the firewall was blocked on the secondary site and removed by fencing (node reboot), rather than by explicit firewall rule removal.

These changes collectively improve the reliability, maintainability, and clarity of the HANA DB HA test automation.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves robustness of HANA DB HA Ansible test scenarios by increasing wait timeouts, refining validation logic for scale-out worker operations, and refactoring network partition behavior to use site-wide iptables default DROP policies (with failure recovery).

Changes:

  • Increase default_timeout from 60s to 90s for HA test waits.
  • Refactor the scale-out secondary network partition test to block all traffic on the secondary site (and adjust reporting).
  • Tighten execution conditions/validation: gate primary-node kill to SAPHanaSR-angi and improve scale-out worker post-checks; simplify changed_when/failed_when expressions.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
src/vars/input-api.yaml Increases default wait timeout used across HA test tasks.
src/roles/ha_db_hana/tasks/secondary-block-network.yml Refactors secondary-site network partition to iptables -P DROP with async delegated application and updated reporting/rescue flow.
src/roles/ha_db_hana/tasks/primary-node-kill.yml Restricts primary node kill test execution to ANG I provider only.
src/roles/ha_db_hana/tasks/includes/scaleout-worker-operation.yml Adjusts cluster validation criteria and removes unnecessary Jinja wrapping in changed_when/failed_when.

Comment thread src/roles/ha_db_hana/tasks/secondary-block-network.yml Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@dhruvmicrosoft dhruvmicrosoft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG!

@devanshjainms devanshjainms merged commit eef976b into Azure:development-feb-2026 Apr 10, 2026
5 checks passed
@devanshjainms devanshjainms deleted the scale-out-backup branch April 10, 2026 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants