-
Notifications
You must be signed in to change notification settings - Fork 112
Description
Version
Karpenter Version: v1.7.1 and v1.7.2
Kubernetes Version: v1.35.0
Expected Behavior
Karpenter-provisioned nodes on AKS 1.35 should bootstrap successfully, register with the cluster, and transition their NodeClaim to ready.
Actual Behavior
The Azure VM is created successfully and reaches ProvisioningState=Succeeded, but the Kubernetes node never registers.
The corresponding NodeClaim gets stuck in this state:
Launched=TrueRegistered=UnknownReason=NodeNotFound
We verified directly on a failing VM that kubelet is started with a flag that kubelet 1.35 no longer accepts:
failed to parse kubelet flag: unknown flag: --pod-infra-container-image
The generated /etc/default/kubelet on the VM contains:
--pod-infra-container-image=mcr.microsoft.com/oss/kubernetes/pause:3.6
Steps to Reproduce the Problem
- Upgrade an AKS cluster to Kubernetes 1.35.0.
- Install Azure Karpenter provider v1.7.1 or v1.7.2.
- Create a
NodePoolandAKSNodeClass. - Trigger provisioning of a new Karpenter-managed node.
- Observe that the VM launches, but no Kubernetes
Nodeobject is ever created.
Resource Specs and Logs
NodeClaim symptoms
The failing NodeClaim reaches launch but never registers:
Type: Launched
Status: True
Reason: Launched
Type: Registered
Status: Unknown
Reason: NodeNotFound
Message: Node not registered with cluster
VM-side evidence
We inspected the VM directly with az vm run-command invoke.
/etc/default/kubelet included:
KUBELET_FLAGS=... --pod-infra-container-image=mcr.microsoft.com/oss/kubernetes/pause:3.6 ...
Recent kubelet log output on the failing VM:
E0314 18:19:08.262960 12399 run.go:72] "command failed" err="failed to parse kubelet flag: unknown flag: --pod-infra-container-image"
systemd[1]: kubelet.service: Failed with result 'exit-code'.
Source code reference
The provider still injects this flag from the bootstrap defaults here:
pkg/providers/imagefamily/bootstrap/staticvalues.go
"--pod-infra-container-image": "mcr.microsoft.com/oss/kubernetes/pause:3.6",This reproduces on at least:
imageFamily: AzureLinuximageFamily: Ubuntu
So this does not appear to be AzureLinux-specific; it appears to be a kubelet 1.35 bootstrap incompatibility in the provider bootstrap path.
Notes
This is distinct from NodeOverlay / CRD issues because:
- the VM does launch successfully
- Azure reports the VM as provisioned and running
- the failure happens during kubelet startup on the guest
Community Note
- Please vote on this issue by adding a 👍 reaction to help prioritize it.
- Please do not leave "+1" or "me too" comments.