Workload cluster upgrade e2e tests flake on new check for Cluster's Available condition

**Which jobs are flaky:**

https://storage.googleapis.com/k8s-triage/index.html?pr=1&text=Remote%20connection%20probe%20failed&job=azure.*workload-upgrade

e.g. https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/periodic-cluster-api-provider-azure-e2e-workload-upgrade-1-31-1-32-main/2014150140197081088

> ```
> {Timed out after 300.000s.
> Failed to verify Cluster Available condition for k8s-upgrade-and-conformance-rmrxf7/k8s-upgrade-and-conformance-1fx44p
> The function passed to Eventually failed at /home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/test@v1.11.5/framework/cluster_helpers.go:457 with:
> The Available condition on the Cluster should be set to true; message: * RemoteConnectionProbe: Remote connection probe failed, probe last succeeded at 2026-01-22T02:16:41Z
> Expected
>     <v1.ConditionStatus>: False
> to equal
>     <v1.ConditionStatus>: True failed [FAILED] Timed out after 300.000s.
> Failed to verify Cluster Available condition for k8s-upgrade-and-conformance-rmrxf7/k8s-upgrade-and-conformance-1fx44p
> The function passed to Eventually failed at /home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/test@v1.11.5/framework/cluster_helpers.go:457 with:
> The Available condition on the Cluster should be set to true; message: * RemoteConnectionProbe: Remote connection probe failed, probe last succeeded at 2026-01-22T02:16:41Z
> Expected
>     <v1.ConditionStatus>: False
> to equal
>     <v1.ConditionStatus>: True
> In [It] at: /home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/test@v1.11.5/framework/cluster_helpers.go:463 @ 01/22/26 02:22:33.833
> }
> ```

**Which tests are flaky:**

**Testgrid link:**

**Reason for failure (if possible):**

This check was added to the CAPI e2e test framework in https://github.com/kubernetes-sigs/cluster-api/pull/12111.

Unclear yet if there's a real problem in CAPZ or something needs to be tuned for the tests. The `--remote-connection-grace-period` command line argument to capi-controller-manager is the only relevant setting I think we can control. The timeouts in the test are hardcoded.

**Anything else we need to know:**
- links to go.k8s.io/triage appreciated
- links to specific failures in spyglass appreciated


/kind flake

[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workload cluster upgrade e2e tests flake on new check for Cluster's Available condition #6068

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Workload cluster upgrade e2e tests flake on new check for Cluster's Available condition #6068

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions