fix(kubelet): strip version-gated fields from kubelet-config for K8s < 1.35#1084
fix(kubelet): strip version-gated fields from kubelet-config for K8s < 1.35#1084lexfrei wants to merge 1 commit intoclastix:masterfrom
Conversation
✅ Deploy Preview for kamaji-documentation canceled.
|
…< 1.35 SetDefaults_KubeletConfiguration() from Kubernetes 1.35 libraries sets fields gated behind KubeletCrashLoopBackOffMax and KubeletEnsureSecretPulledImages feature gates: - crashLoopBackOff.maxContainerRestartPeriod - imagePullCredentialsVerificationPolicy Kubelets running versions prior to 1.35 reject these fields during configuration validation because the corresponding feature gates are not enabled, preventing worker nodes from joining the tenant cluster. Pass the target Kubernetes version to getKubeletConfigmapContent and clear these fields when the version is below 1.35. Return an error if the version string cannot be parsed, consistent with generateKubeletConfigMapName in the same file. Fixes: clastix#1062 Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
01fe4dc to
8ce52ca
Compare
|
I'm sorry for such a disruption. Kamaji is developed to be fast-paced, and we don't aim for backward compatibility, as well as offering safe defaults for previous Kubernetes versions. We have a WIP PR for Cluster API to support Kubeadm patches to solve them, and that's the way I'm thinking of: Kamaji must be agnostic on the deployed Kubernetes version, and avoid the Kamaji is a tool, the platform engineers must know in advance what could go wrong when upgrading it and avoid conventions over configuration, which are essential for having a consistent configuration across Kamaji versions. Introducing conditions across the code base because the Kubernetes changelog has not been carefully evaluated before updating Kamaji is not viable for the longevity and maintainability of the project. |
|
Hey @prometherion, thanks for the context — I understand the concern about maintainability. I took another look at the codebase before responding, and I think there's a case for reconsidering.
// TODO(prometherion): drop support of <= v1.27 TCP versionsMy fix is the same approach: a version check that lives until old versions fall out of support, then gets cleaned up. So the pattern is already established in the project, not something new I'm introducing. I'm aware that Issue #1062 was opened by @piepmatz independently — this affects anyone running tenant clusters on K8s 1.30–1.34 after a Kamaji upgrade. We've already patched this downstream in Cozystack, so this isn't blocking us. I opened the PR because I thought it'd be useful for the community — but I'm happy to adjust the approach if there's a middle ground that fits better. Maybe the version-gated checks could be consolidated into a helper, or there's a different place in the code you'd prefer. Open to suggestions! |
|
A few more points I'd like to add. Why this matters to us specifically: Cozystack is an open-source PaaS platform for building private clouds — Kamaji is one of its core components for provisioning tenant Kubernetes clusters. Our users create clusters through a catalog UI or a simplified CRD — they never interact with TenantControlPlane or Kamaji directly, and they shouldn't need to know about kubelet configuration internals. When a user requests a K8s 1.31 cluster, the platform should just provision it correctly. Right now, after upgrading Kamaji to edge-26.2.4, every tenant cluster running K8s < 1.35 breaks — worker nodes fail to join. The We carry this patch downstream, and we're fine with that for now. But we'd prefer to contribute it upstream so that other Kamaji adopters building platforms don't have to independently rediscover and solve the same problem — as has already happened three times (#1047, #1062, this PR). Documented support matrix: The versioning docs state that tenant clusters down to K8s 1.30 are supported. If backward compatibility for these versions is not a goal, the documentation should reflect that — otherwise users reasonably expect supported versions to work without manual intervention. configurationJSONPatches coverage gap: As @piepmatz noted in #1062, Happy to iterate on the implementation — consolidating version checks into a helper, adjusting the structure, whatever works best for the project. |
Does it? I tried |
What this PR does
Fixes kubelet-config generation for tenant clusters running Kubernetes versions prior to 1.35.
Since Kamaji is now compiled against Kubernetes 1.35 libraries,
SetDefaults_KubeletConfiguration()populates two fields that are gated behind feature gates introduced in 1.35:crashLoopBackOff.maxContainerRestartPeriod(KubeletCrashLoopBackOffMax)imagePullCredentialsVerificationPolicy(KubeletEnsureSecretPulledImages)Kubelets < 1.35 reject these fields during configuration validation because the corresponding feature gates do not exist (or are not enabled), causing worker nodes to fail to join the tenant cluster with:
This fix passes the target Kubernetes version into
getKubeletConfigmapContentand clears the incompatible fields when the version is below 1.35. The clearing happens beforeconfigurationJSONPatchesare applied, so users can still override these fields explicitly if needed.Context
We hit this issue in Cozystack after upgrading to Kamaji edge-26.2.4. All tenant clusters running K8s v1.30–v1.34 were affected — worker nodes could not join because kubelet rejected the version-gated fields in the kubelet-config ConfigMap.
While
configurationJSONPatches(#1052) provides a per-TenantControlPlane workaround, it requires every user to manually patch each cluster, and is not exposed through the KamajiControlPlane CRD (CAPI provider). This fix resolves the issue at the operator level, requiring no user intervention.Fixes #1062