Skip to content

[WIP] Default spot and custom-vnet e2e templates to Azure Linux 4#6330

Open
mboersma wants to merge 2 commits into
kubernetes-sigs:mainfrom
mboersma:azurelinux4-e2e-default
Open

[WIP] Default spot and custom-vnet e2e templates to Azure Linux 4#6330
mboersma wants to merge 2 commits into
kubernetes-sigs:mainfrom
mboersma:azurelinux4-e2e-default

Conversation

@mboersma

@mboersma mboersma commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This begins making Azure Linux 4 the default operating system for CAPZ e2e tests by flipping the control plane and Linux worker nodes of two CI cluster-template flavors to the capi-azurelinux-4-gen2 community gallery image:

  • prow-spot (OPTIONAL)
  • prow-custom-vnet (OPTIONAL)

Inspired by #6312, but rather than adding a single optional Azure Linux 4 spec, we exercise Azure Linux 4 across multiple existing specs. The AZL4 deltas (compute gallery image, marketplace removal, ca-certificates/iptables bootstrap, and cloud-provider-azure caCertDir) are factored into reusable patches under templates/test/ci/patches/azl4-*.yaml, modeled on the existing Azure Linux 3 overlay.

prow-azure-cni-v1 was initially included but reverted: Azure CNI v1's host-NIC reconfiguration appears incompatible with Azure Linux 4, causing the control plane to lose external reachability ~10 minutes after boot. Both spot and custom-vnet use Calico (matching the proven #6312 optional test) and are expected to work.

Scope is intentionally limited to flavors that have no ci-version/azl3 child overlays, so the change is self-contained and low-risk. The basic "Creating a highly available cluster [REQUIRED]" spec and the optional Azure Linux 3 spec are left on their existing images, end-user-facing templates remain Ubuntu-based, and Windows worker nodes are unaffected.

Follow-up PRs can extend Azure Linux 4 to the machine-pool, ci-version, and conformance flavors, which require a leaf-overlay approach to avoid disrupting their child templates.

Which issue(s) this PR fixes:

Special notes for your reviewer:

AZL4_VERSION is set in the e2e BeforeEach (keyed to the Kubernetes version, matching the existing AZL3_VERSION convention). These templates depend on the capi-azurelinux-4-gen2 image being published in the community gallery.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests
  • cherry-pick candidate

Release note:

Default spot and custom-vnet e2e templates to Azure Linux 4

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Jun 1, 2026
@k8s-ci-robot k8s-ci-robot requested a review from Jont828 June 1, 2026 20:55
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mboersma for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from jsturtevant June 1, 2026 20:55
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 1, 2026
@mboersma mboersma changed the title Default azure-cni-v1, spot, and custom-vnet e2e templates to Azure Linux 4 [WIP] Default azure-cni-v1, spot, and custom-vnet e2e templates to Azure Linux 4 Jun 1, 2026
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 1, 2026
@codecov

codecov Bot commented Jun 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 43.95%. Comparing base (f5bc974) to head (ba043a5).
⚠️ Report is 42 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6330      +/-   ##
==========================================
+ Coverage   43.85%   43.95%   +0.10%     
==========================================
  Files         291      288       -3     
  Lines       25344    25285      -59     
==========================================
  Hits        11114    11114              
+ Misses      13457    13398      -59     
  Partials      773      773              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@linux-foundation-easycla

linux-foundation-easycla Bot commented Jun 8, 2026

Copy link
Copy Markdown

CLA Signed
The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 8, 2026
@mboersma mboersma changed the title [WIP] Default azure-cni-v1, spot, and custom-vnet e2e templates to Azure Linux 4 [WIP] Default spot and custom-vnet e2e templates to Azure Linux 4 Jun 8, 2026
@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 8, 2026
Azure CNI v1 + Azure Linux 4 causes the control plane to become
unreachable externally: the node boots and kubeadm init succeeds
locally, but the node drops off the network ~10 minutes after boot,
preventing the cluster from reaching ControlPlaneInitialized. The
same AzL4 setup scripts work fine with Calico-based flavors (spot,
custom-vnet), so this appears to be an incompatibility between Azure
CNI v1's host-NIC reconfiguration and Azure Linux 4.

Keep azure-cni-v1 on the default Ubuntu CI image for now; only the
Calico-based spot and custom-vnet flavors default to Azure Linux 4.
@mboersma mboersma force-pushed the azurelinux4-e2e-default branch from 1a46f8e to ba043a5 Compare June 8, 2026 17:21
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jun 8, 2026
@mboersma

mboersma commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

/retest

1 similar comment
@mboersma

mboersma commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

/retest

@mboersma

Copy link
Copy Markdown
Contributor Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

2 participants