Skip to content

Conversation

@bryan-cox
Copy link
Contributor

@bryan-cox bryan-cox commented Oct 24, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR implements support for configuring availability zones on Azure load balancers to enable zone-redundant configurations for high availability.

Azure load balancers can be configured as zone-redundant to ensure high availability across multiple availability zones within a region. This feature allows users to specify availability zones (1, 2, 3) on load balancers, which are then set on the frontend IP configurations.

Key changes:

  • Added AvailabilityZones field to LoadBalancerSpec API
  • Implemented service layer to set zones on frontend IP configurations
  • Added webhook validation to enforce Azure's zone immutability requirement
  • Included comprehensive documentation with examples and migration guidance
  • Added unit tests and E2E tests

Which issue(s) this PR fixes:
Fixes #5709

Special notes for your reviewer:

This implementation follows Azure's zone redundancy model:

  • For internal load balancers: zones are set directly on frontend IP configurations
  • For public load balancers: zones should be set on associated public IP addresses (documented)
  • Zones are immutable after creation per Azure platform requirements
  • Webhook validation prevents invalid zone modifications

The E2E test is optional and creates a cluster with zone-redundant load balancers to verify the feature works end-to-end in Azure.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests
  • cherry-pick candidate

Release note:

Add support for zone-redundant load balancers. Users can now configure availability zones on load balancers (APIServerLB, NodeOutboundLB, ControlPlaneOutboundLB) to enable zone-redundant configurations for high availability across multiple availability zones.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. kind/feature Categorizes issue or PR as related to a new feature. labels Oct 24, 2025
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 24, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jont828 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Oct 24, 2025
@bryan-cox bryan-cox force-pushed the issue-5709-lb-zone-redundancy branch from 7572b39 to 67685f0 Compare October 24, 2025 17:04
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 24, 2025
@bryan-cox bryan-cox force-pushed the issue-5709-lb-zone-redundancy branch from 67685f0 to 2e5d373 Compare October 24, 2025 17:10
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 24, 2025
@bryan-cox bryan-cox force-pushed the issue-5709-lb-zone-redundancy branch from 2e5d373 to e29bc2e Compare October 24, 2025 17:13
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Oct 24, 2025
@codecov
Copy link

codecov bot commented Oct 24, 2025

Codecov Report

❌ Patch coverage is 51.35135% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.68%. Comparing base (2130510) to head (daacbf5).
⚠️ Report is 17 commits behind head on main.

Files with missing lines Patch % Lines
api/v1beta1/azurecluster_webhook.go 33.33% 15 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5944      +/-   ##
==========================================
+ Coverage   44.54%   44.68%   +0.13%     
==========================================
  Files         279      279              
  Lines       25140    25281     +141     
==========================================
+ Hits        11199    11296      +97     
- Misses      13128    13159      +31     
- Partials      813      826      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jackfrancis
Copy link
Contributor

@bryan-cox can you add this new functionality to the existing E2E scenario for a private cluster, which ships with/ an internal LB? E.g.:

$ git diff templates/flavors/private/patches/private-lb.yaml
diff --git a/templates/flavors/private/patches/private-lb.yaml b/templates/flavors/private/patches/private-lb.yaml
index 76e1539df..a2933e299 100644
--- a/templates/flavors/private/patches/private-lb.yaml
+++ b/templates/flavors/private/patches/private-lb.yaml
@@ -7,6 +7,10 @@ spec:
     apiServerLB:
       name: ${CLUSTER_NAME}-internal-lb
       type: Internal
+      availabilityZones:
+        - "1"
+        - "2"
+        - "3"
     nodeOutboundLB:
       frontendIPsCount: 1
     controlPlaneOutboundLB:

After you apply the above changes to the template partial above, render updated templates w/ kustomize by invoking make generate flavors from the git root directory.

cc @nojnhuh @mboersma

@jackfrancis
Copy link
Contributor

/test pull-cluster-api-provider-azure-e2e-optional

@bryan-cox bryan-cox force-pushed the issue-5709-lb-zone-redundancy branch from e29bc2e to 3b77777 Compare October 27, 2025 20:20
@bryan-cox
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e-optional

@bryan-cox
Copy link
Contributor Author

/retest

@bryan-cox bryan-cox force-pushed the issue-5709-lb-zone-redundancy branch from 3b77777 to 6fa7de7 Compare October 28, 2025 10:34
@bryan-cox
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e-optional

@bryan-cox bryan-cox force-pushed the issue-5709-lb-zone-redundancy branch from 36f5c8f to e69a17e Compare October 31, 2025 12:55
@bryan-cox
Copy link
Contributor Author

Attempting to get the PR back to its stable state before attempting to address #5944 (comment) again.

@bryan-cox
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e-optional

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 26, 2025
@bryan-cox bryan-cox force-pushed the issue-5709-lb-zone-redundancy branch from 461ebb0 to 9ca4c6c Compare November 26, 2025 19:07
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 26, 2025
@bryan-cox
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e-optional

@bryan-cox bryan-cox force-pushed the issue-5709-lb-zone-redundancy branch from a37ab02 to b5bbbad Compare November 26, 2025 20:58
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 26, 2025
@bryan-cox
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e-optional

Add comprehensive documentation for zone-redundant load balancer feature:

- Explain Azure zone redundancy concepts for load balancers
- Provide configuration examples for all load balancer types:
  - Internal load balancers (API server)
  - Public load balancers
  - Node outbound load balancers
  - Control plane outbound load balancers
- Include complete highly available cluster example
- Document important considerations:
  - Immutability of zones after creation
  - Region support requirements
  - Standard SKU requirement
  - Backend pool placement best practices
- Provide migration guidance for existing clusters
- Add troubleshooting section
- Document best practices
@bryan-cox bryan-cox force-pushed the issue-5709-lb-zone-redundancy branch from b5bbbad to aee70a4 Compare December 11, 2025 18:46
@k8s-ci-robot k8s-ci-robot added the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Dec 11, 2025
- Add AvailabilityZones field to LoadBalancerSpec API
- Implement zone support in service layer for frontend IP configs
- Add webhook validation for zone immutability
- Update generated CRD manifests
- Add zone redundancy to private cluster flavor
- Add unit tests for zone configuration on frontend IPs
- Add E2E test for zone-redundant LB verification
- Add apiserver-ilb-zones flavor for E2E testing
@bryan-cox bryan-cox force-pushed the issue-5709-lb-zone-redundancy branch from aee70a4 to daacbf5 Compare December 11, 2025 18:47
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Dec 11, 2025
@bryan-cox
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e-optional

@k8s-ci-robot
Copy link
Contributor

@bryan-cox: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-azure-capi-e2e 67685f0 link false /test pull-cluster-api-provider-azure-capi-e2e
pull-cluster-api-provider-azure-apiversion-upgrade daacbf5 link true /test pull-cluster-api-provider-azure-apiversion-upgrade
pull-cluster-api-provider-azure-e2e-optional daacbf5 link false /test pull-cluster-api-provider-azure-e2e-optional

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

Load balancers are not zone redundant and can't be configured as such

3 participants