Skip to content

Worker nodes cannot join their clusters with K8s < 1.35 #1062

@piepmatz

Description

@piepmatz

Observation

After upgrading Kamaji from edge-25.11.5 to edge-26.1.3, worker nodes in tenant clusters no longer join their control planes.

Reason

edge-25.12.5 added support for K8s 1.35 in #1038.

Starting with K8s 1.35, the following feature gates are enabled by default:

  • KubeletCrashLoopBackOffMax
  • KubeletEnsureSecretPulledImages

This leads to the kubelet-config ConfigMap now containing the corresponding new lines:

  • crashLoopBackOff:
      maxContainerRestartPeriod: 5m0s
    
  • imagePullCredentialsVerificationPolicy: NeverVerifyPreloadedImages
    

For kubelet versions < 1.35 both features are still disabled by default and make kubelet reject the config:

Jan 19 13:38:29 nodename kubelet[1067]: E0119 13:38:29.050323    1067 run.go:72] "command failed" err="failed to validate kubelet configuration, error: [invalid configuration: FeatureGate KubeletCrashLoopBackOffMax not enabled, CrashLoopBackOff.MaxContainerRestartPeriod must not be set, invalid configuration: `imagePullCredentialsVerificationPolicy` must not be set if KubeletEnsureSecretPulledImages feature gate is not enabled], path: &TypeMeta{Kind:,APIVersion:,}"

Resolution

There are several options:

  1. Hardcode both feature gates to be disabled in https://github.com/clastix/kamaji/blob/edge-26.1.3/internal/kubeadm/uploadconfig.go#L77 as long as tenant cluster versions < 1.35 are supported. There's a FeatureGates field. Sounds bad overall.
  2. Expose feature gate configuration. For the control plane components the ExtraArgs fields already allow injecting e.g. --feature-gates=KubeletCrashLoopBackOffMax=false, but for the kubelet there's no such thing yet. The current KubeletConfiguration is very minimal. This approach would solve the current issue and seems desirable in general.
  3. I think the core issue is running kubeadm functionality from the latest k/k minor version regardless of the tenant cluster version and the kubelet versions it allows. The docs indicate that tenant clusters down to 1.30 are supported, allowing ever older kubelets. Considering kubeadm's version skew policy, using different k/k module versions might be the right thing to do, but it is quite cumbersome.
  4. Allow Kamaji users to take care of the contents of the kubeadm-config and kubelet-config ConfigMaps on their own, e.g. by offering an opt-out per kubeadm phase, PhaseUploadConfigKubeadm and PhaseUploadConfigKubelet in this case. Besides configuring kubelet feature gates, users can have arbitrary requirements for the contents of both ConfigMaps. Exposing all those options via the TenantControlPlane CRD seems undesirable. Allowing users to optionally take over responsibility for both ConfigMaps would kill two birds with one stone.

Regression testing

Even though there's an e2e test, the issue was unnoticed in #1038 because the worker node also uses K8s 1.35, so kubelet doesn't complain about the new config lines.

There should be an e2e test matrix for each combination of supported tenant control plane version and worker node version.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions