Skip to content

fix(kubelet): strip version-gated fields from kubelet-config for K8s < 1.35#1084

Open
lexfrei wants to merge 1 commit intoclastix:masterfrom
lexfrei:fix/kubelet-config-compat-pre-1.35
Open

fix(kubelet): strip version-gated fields from kubelet-config for K8s < 1.35#1084
lexfrei wants to merge 1 commit intoclastix:masterfrom
lexfrei:fix/kubelet-config-compat-pre-1.35

Conversation

@lexfrei
Copy link
Contributor

@lexfrei lexfrei commented Feb 20, 2026

What this PR does

Fixes kubelet-config generation for tenant clusters running Kubernetes versions prior to 1.35.

Since Kamaji is now compiled against Kubernetes 1.35 libraries, SetDefaults_KubeletConfiguration() populates two fields that are gated behind feature gates introduced in 1.35:

  • crashLoopBackOff.maxContainerRestartPeriod (KubeletCrashLoopBackOffMax)
  • imagePullCredentialsVerificationPolicy (KubeletEnsureSecretPulledImages)

Kubelets < 1.35 reject these fields during configuration validation because the corresponding feature gates do not exist (or are not enabled), causing worker nodes to fail to join the tenant cluster with:

failed to validate kubelet configuration, error: [
  invalid configuration: FeatureGate KubeletCrashLoopBackOffMax not enabled,
    CrashLoopBackOff.MaxContainerRestartPeriod must not be set,
  invalid configuration: imagePullCredentialsVerificationPolicy must not be set
    if KubeletEnsureSecretPulledImages feature gate is not enabled
]

This fix passes the target Kubernetes version into getKubeletConfigmapContent and clears the incompatible fields when the version is below 1.35. The clearing happens before configurationJSONPatches are applied, so users can still override these fields explicitly if needed.

Context

We hit this issue in Cozystack after upgrading to Kamaji edge-26.2.4. All tenant clusters running K8s v1.30–v1.34 were affected — worker nodes could not join because kubelet rejected the version-gated fields in the kubelet-config ConfigMap.

While configurationJSONPatches (#1052) provides a per-TenantControlPlane workaround, it requires every user to manually patch each cluster, and is not exposed through the KamajiControlPlane CRD (CAPI provider). This fix resolves the issue at the operator level, requiring no user intervention.

Fixes #1062

@netlify
Copy link

netlify bot commented Feb 20, 2026

Deploy Preview for kamaji-documentation canceled.

Name Link
🔨 Latest commit 8ce52ca
🔍 Latest deploy log https://app.netlify.com/projects/kamaji-documentation/deploys/69982f4417974000084c8fb6

@lexfrei lexfrei marked this pull request as ready for review February 20, 2026 09:03
…< 1.35

SetDefaults_KubeletConfiguration() from Kubernetes 1.35 libraries sets
fields gated behind KubeletCrashLoopBackOffMax and
KubeletEnsureSecretPulledImages feature gates:
- crashLoopBackOff.maxContainerRestartPeriod
- imagePullCredentialsVerificationPolicy

Kubelets running versions prior to 1.35 reject these fields during
configuration validation because the corresponding feature gates are
not enabled, preventing worker nodes from joining the tenant cluster.

Pass the target Kubernetes version to getKubeletConfigmapContent and
clear these fields when the version is below 1.35. Return an error
if the version string cannot be parsed, consistent with
generateKubeletConfigMapName in the same file.

Fixes: clastix#1062

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
@lexfrei lexfrei force-pushed the fix/kubelet-config-compat-pre-1.35 branch from 01fe4dc to 8ce52ca Compare February 20, 2026 09:54
@prometherion
Copy link
Member

I'm sorry for such a disruption. Kamaji is developed to be fast-paced, and we don't aim for backward compatibility, as well as offering safe defaults for previous Kubernetes versions.

We have a WIP PR for Cluster API to support Kubeadm patches to solve them, and that's the way I'm thinking of: Kamaji must be agnostic on the deployed Kubernetes version, and avoid the if/else drama when breaking changes are introduced. The test case you provided, although serving the right purpose, is proof that it's going to be unmaintainable in the long term with upcoming Kubernetes versions, and these conditions will pile up over time.

Kamaji is a tool, the platform engineers must know in advance what could go wrong when upgrading it and avoid conventions over configuration, which are essential for having a consistent configuration across Kamaji versions.

Introducing conditions across the code base because the Kubernetes changelog has not been carefully evaluated before updating Kamaji is not viable for the longevity and maintainability of the project.

@lexfrei
Copy link
Contributor Author

lexfrei commented Feb 21, 2026

Hey @prometherion, thanks for the context — I understand the concern about maintainability.

I took another look at the codebase before responding, and I think there's a case for reconsidering.

generateKubeletConfigMapName in the same file already follows this exact pattern — it parses the target version and branches on >= 1.24. There's also a TODO for eventually dropping the old path:

// TODO(prometherion): drop support of <= v1.27 TCP versions

My fix is the same approach: a version check that lives until old versions fall out of support, then gets cleaned up. So the pattern is already established in the project, not something new I'm introducing.

I'm aware that configurationJSONPatches exists as a workaround, but it shifts the burden to every operator running mixed-version tenant clusters — they'd need to know about version-specific kubelet fields and craft patches manually. The version check in Kamaji handles it transparently.

Issue #1062 was opened by @piepmatz independently — this affects anyone running tenant clusters on K8s 1.30–1.34 after a Kamaji upgrade.

We've already patched this downstream in Cozystack, so this isn't blocking us. I opened the PR because I thought it'd be useful for the community — but I'm happy to adjust the approach if there's a middle ground that fits better. Maybe the version-gated checks could be consolidated into a helper, or there's a different place in the code you'd prefer.

Open to suggestions!

@lexfrei
Copy link
Contributor Author

lexfrei commented Feb 21, 2026

A few more points I'd like to add.

Why this matters to us specifically: Cozystack is an open-source PaaS platform for building private clouds — Kamaji is one of its core components for provisioning tenant Kubernetes clusters. Our users create clusters through a catalog UI or a simplified CRD — they never interact with TenantControlPlane or Kamaji directly, and they shouldn't need to know about kubelet configuration internals.

When a user requests a K8s 1.31 cluster, the platform should just provision it correctly. Right now, after upgrading Kamaji to edge-26.2.4, every tenant cluster running K8s < 1.35 breaks — worker nodes fail to join. The configurationJSONPatches workaround doesn't help here because our users don't have access to patch individual TenantControlPlane resources. We'd have to embed version-detection logic into Helm templates to conditionally inject JSON patches, which moves the complexity from Go code (where it naturally belongs and where the version is already available) into templating (where it's fragile and harder to maintain).

We carry this patch downstream, and we're fine with that for now. But we'd prefer to contribute it upstream so that other Kamaji adopters building platforms don't have to independently rediscover and solve the same problem — as has already happened three times (#1047, #1062, this PR).

Documented support matrix: The versioning docs state that tenant clusters down to K8s 1.30 are supported. If backward compatibility for these versions is not a goal, the documentation should reflect that — otherwise users reasonably expect supported versions to work without manual intervention.

configurationJSONPatches coverage gap: As @piepmatz noted in #1062, configurationJSONPatches (#1052) addresses kubelet config but not kubeadm config. So even with the workaround, the coverage is incomplete.

Happy to iterate on the implementation — consolidating version checks into a helper, adjusting the structure, whatever works best for the project.

@toelke
Copy link
Contributor

toelke commented Mar 4, 2026

While configurationJSONPatches (#1052) provides a per-TenantControlPlane workaround,

Does it? I tried "op": "remove"ing the offending options, but the result of the patch is parsed back into the v1.35 struct and the defaults are set again immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Worker nodes cannot join their clusters with K8s < 1.35

3 participants