Skip to content

API version upgrade tests sometimes fail with "resourceVersions never became stable" #6058

@nojnhuh

Description

@nojnhuh

Which jobs are flaky:

https://storage.googleapis.com/k8s-triage/index.html?pr=1&text=resourceVersions%20never%20became%20stable%5B%5Cs%5CS%5D*%3F%5E%5C%2B%5Cs*nil%2C&job=azure

e.g. https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-azure/6048/pull-cluster-api-provider-azure-apiversion-upgrade/2014000463510245376

{Timed out after 70.276s.
resourceVersions never became stable
The function passed to Eventually failed at /home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/test@v1.11.4/framework/resourceversion_helpers.go:51 with:
Expected object to be comparable, diff:   map[string]string(
- 	{
- 		"AzureCluster/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c":                         "5480",
- 		"AzureClusterIdentity/clusterctl-upgrade/cluster-identity-ci":                                "4266",
- 		"AzureMachine/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c-control-plane-pcdpp":     "6399",
- 		"AzureMachine/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c-md-0-kqv48-7vzj7":        "7089",
- 		"AzureMachinePool/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c-mp-0":                "7651",
- 		"AzureMachinePoolMachine/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c-mp-0-0":       "7574",
- 		"AzureMachineTemplate/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c-control-plane":   "4085",
- 		"AzureMachineTemplate/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c-md-0":            "4117",
- 		"AzureMachineTemplate/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c-md-win":          "4202",
- 		"Cluster/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c":                              "10678",
- 		"ClusterResourceSet/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c-calico-windows":    "6920",
- 		"ClusterResourceSet/clusterctl-upgrade/containerd-logger-clusterctl-upgrade-workload-8plj4c": "6466",
- 		"ClusterResourceSet/clusterctl-upgrade/csi-proxy":                                            "6464",
- 		"ClusterResourceSetBinding/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c":            "6918",
- 		"ConfigMap/clusterctl-upgrade/cni-clusterctl-upgrade-workload-8plj4c-calico-windows":         "4392",
- 		"ConfigMap/clusterctl-upgrade/containerd-logger-clusterctl-upgrade-workload-8plj4c":          "4412",
- 		...
- 	},
+ 	nil,
  )
 failed [FAILED] Timed out after 70.276s.
resourceVersions never became stable
The function passed to Eventually failed at /home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/test@v1.11.4/framework/resourceversion_helpers.go:51 with:
Expected object to be comparable, diff:   map[string]string(
- 	{
- 		"AzureCluster/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c":                         "5480",
- 		"AzureClusterIdentity/clusterctl-upgrade/cluster-identity-ci":                                "4266",
- 		"AzureMachine/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c-control-plane-pcdpp":     "6399",
- 		"AzureMachine/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c-md-0-kqv48-7vzj7":        "7089",
- 		"AzureMachinePool/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c-mp-0":                "7651",
- 		"AzureMachinePoolMachine/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c-mp-0-0":       "7574",
- 		"AzureMachineTemplate/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c-control-plane":   "4085",
- 		"AzureMachineTemplate/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c-md-0":            "4117",
- 		"AzureMachineTemplate/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c-md-win":          "4202",
- 		"Cluster/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c":                              "10678",
- 		"ClusterResourceSet/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c-calico-windows":    "6920",
- 		"ClusterResourceSet/clusterctl-upgrade/containerd-logger-clusterctl-upgrade-workload-8plj4c": "6466",
- 		"ClusterResourceSet/clusterctl-upgrade/csi-proxy":                                            "6464",
- 		"ClusterResourceSetBinding/clusterctl-upgrade/clusterctl-upgrade-workload-8plj4c":            "6918",
- 		"ConfigMap/clusterctl-upgrade/cni-clusterctl-upgrade-workload-8plj4c-calico-windows":         "4392",
- 		"ConfigMap/clusterctl-upgrade/containerd-logger-clusterctl-upgrade-workload-8plj4c":          "4412",
- 		...
- 	},
+ 	nil,
  )
In [It] at: /home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/test@v1.11.4/framework/resourceversion_helpers.go:52 @ 01/21/26 16:29:16.47
}

Which tests are flaky:

Testgrid link:

Reason for failure (if possible):

Something must be going wrong when the CAPI framework is gathering its view of the current resourceVersions of the relevant objects to result in the comparison to nil: https://github.com/kubernetes-sigs/cluster-api/blob/5226d3ff5782d06396ba1740a46bb59d6de2a8ec/test/framework/resourceversion_helpers.go#L38

We should hit this test hard again once we merge #6009 since one change was made to this to fix an upstream CAPI flake, but those symptoms don't look similar to this one. kubernetes-sigs/cluster-api#12334

Anything else we need to know:

  • links to go.k8s.io/triage appreciated
  • links to specific failures in spyglass appreciated

/kind flake

[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/flakeCategorizes issue or PR as related to a flaky test.

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions