-
Notifications
You must be signed in to change notification settings - Fork 189
Description
Observation
After upgrading Kamaji from edge-25.11.5 to edge-26.1.3, worker nodes in tenant clusters no longer join their control planes.
Reason
edge-25.12.5 added support for K8s 1.35 in #1038.
Starting with K8s 1.35, the following feature gates are enabled by default:
KubeletCrashLoopBackOffMaxKubeletEnsureSecretPulledImages
This leads to the kubelet-config ConfigMap now containing the corresponding new lines:
-
crashLoopBackOff: maxContainerRestartPeriod: 5m0s -
imagePullCredentialsVerificationPolicy: NeverVerifyPreloadedImages
For kubelet versions < 1.35 both features are still disabled by default and make kubelet reject the config:
Jan 19 13:38:29 nodename kubelet[1067]: E0119 13:38:29.050323 1067 run.go:72] "command failed" err="failed to validate kubelet configuration, error: [invalid configuration: FeatureGate KubeletCrashLoopBackOffMax not enabled, CrashLoopBackOff.MaxContainerRestartPeriod must not be set, invalid configuration: `imagePullCredentialsVerificationPolicy` must not be set if KubeletEnsureSecretPulledImages feature gate is not enabled], path: &TypeMeta{Kind:,APIVersion:,}"
Resolution
There are several options:
- Hardcode both feature gates to be disabled in https://github.com/clastix/kamaji/blob/edge-26.1.3/internal/kubeadm/uploadconfig.go#L77 as long as tenant cluster versions < 1.35 are supported. There's a
FeatureGatesfield. Sounds bad overall. - Expose feature gate configuration. For the control plane components the
ExtraArgsfields already allow injecting e.g.--feature-gates=KubeletCrashLoopBackOffMax=false, but for the kubelet there's no such thing yet. The current KubeletConfiguration is very minimal. This approach would solve the current issue and seems desirable in general. - I think the core issue is running kubeadm functionality from the latest
k/kminor version regardless of the tenant cluster version and the kubelet versions it allows. The docs indicate that tenant clusters down to 1.30 are supported, allowing ever older kubelets. Considering kubeadm's version skew policy, using differentk/kmodule versions might be the right thing to do, but it is quite cumbersome. - Allow Kamaji users to take care of the contents of the
kubeadm-configandkubelet-configConfigMaps on their own, e.g. by offering an opt-out per kubeadm phase,PhaseUploadConfigKubeadmandPhaseUploadConfigKubeletin this case. Besides configuring kubelet feature gates, users can have arbitrary requirements for the contents of both ConfigMaps. Exposing all those options via the TenantControlPlane CRD seems undesirable. Allowing users to optionally take over responsibility for both ConfigMaps would kill two birds with one stone.
Regression testing
Even though there's an e2e test, the issue was unnoticed in #1038 because the worker node also uses K8s 1.35, so kubelet doesn't complain about the new config lines.
There should be an e2e test matrix for each combination of supported tenant control plane version and worker node version.