Description
What steps did you take and what happened?
I encountered an issue when upgrading a Cluster to use a new ClusterClass after migrating from CAPI v1.6 → v1.8. The upgrade fails because old variables persist in spec.topology.variables
, even though they should have been removed by Server-Side Apply (SSA).
Steps to Reproduce
-
Create an initial ClusterClass (
oldClass
)- This ClusterClass contains a required variable:
oldVariable
.
- This ClusterClass contains a required variable:
-
Create a Cluster using
oldClass
- The cluster’s
spec.topology.variables
includesoldVariable
.
- The cluster’s
-
Create a new ClusterClass (
newClass
)newClass
has a required variablenewVariable
instead ofoldVariable
.- Only the name differs; the format and value remain the same.
-
Upgrade the cluster to use
newClass
via SSA patch- Expected result:
oldVariable
should be removed, andnewVariable
should be added.
- Expected result:
Scenarios Tested
✅ Scenario 1: Running on CAPI v1.6.x
Deployment Versions:
NAMESPACE NAME VERSION
capi-kubeadm-bootstrap-system bootstrap-kubeadm v1.6.0
capi-kubeadm-control-plane-system control-plane-kubeadm v1.6.0
capi-system cluster-api v1.6.0
capo-system infrastructure-openstack v0.9.0
Result:
- The upgrade works correctly.
- After upgrading the ClusterClass, only
newVariable
exists, andoldVariable
is removed.
✅ Scenario 2: Running on CAPI v1.8.x
Deployment Versions:
NAMESPACE NAME VERSION
capi-kubeadm-bootstrap-system bootstrap-kubeadm v1.8.4
capi-kubeadm-control-plane-system control-plane-kubeadm v1.8.4
capi-system cluster-api v1.8.4
capo-system infrastructure-openstack v0.11.2
Result:
- The upgrade works correctly.
- After upgrading the ClusterClass, only
newVariable
exists, andoldVariable
is removed.
❌ Scenario 3: Upgrading from CAPI v1.6 → v1.8 and then upgrading ClusterClass
- Deploy Cluster using CAPI v1.6.0 (Scenario 1 setup).
- Upgrade CAPI to v1.8.4 (Scenario 2 setup).
- Attempt to upgrade Cluster to use
newClass
(SSA patch).- 🔴 The upgrade fails because
oldVariable
still exists, despite not being defined innewClass
.
- 🔴 The upgrade fails because
What did you expect to happen?
- When upgrading a cluster to use a new ClusterClass, SSA should correctly remove old variables that are no longer part of the new ClusterClass.
- This behavior should remain consistent across CAPI versions.
Cluster API version
I mentioned the cluster-api versions in the scenario above.
Kubernetes version
I tested in these versions.
1.25.x
1.28.x
Anything else you would like to add?
What I Found
managedFields
behaves differently between versions.
- In CAPI v1.6.x (Older versions),
spec.topology.variables
was tracked as a single field:f:variables: {}
- In CAPI v1.8.x (Newer versions), each variable in
spec.topology.variables
is individually tracked:f:variables: k:{"name":"apiServerTLSCipherSuites"}: .: {} f:name: {} f:value: {}
Hypothesis:
- In CAPI v1.6, SSA does not track individual variables, so removing a variable implicitly removes it from
spec.topology.variables
. - In CAPI v1.8, SSA tracks each variable separately, preventing removal if ownership conflicts exist.
- This change breaks upgrades when transitioning from v1.6 to v1.8.
Possible Causes
- Changes in SSA handling of
spec.topology.variables
between CAPI v1.6 → v1.8. - Stricter managedFields tracking in newer versions.
- Potential ownership conflicts preventing removal of fields.
Label(s) to be applied
/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.