Skip to content

AKS 1.35 nodes fail to register because kubelet is started with removed flag --pod-infra-container-image #1516

@kayahk

Description

@kayahk

Version

Karpenter Version: v1.7.1 and v1.7.2
Kubernetes Version: v1.35.0

Expected Behavior

Karpenter-provisioned nodes on AKS 1.35 should bootstrap successfully, register with the cluster, and transition their NodeClaim to ready.

Actual Behavior

The Azure VM is created successfully and reaches ProvisioningState=Succeeded, but the Kubernetes node never registers.

The corresponding NodeClaim gets stuck in this state:

  • Launched=True
  • Registered=Unknown
  • Reason=NodeNotFound

We verified directly on a failing VM that kubelet is started with a flag that kubelet 1.35 no longer accepts:

failed to parse kubelet flag: unknown flag: --pod-infra-container-image

The generated /etc/default/kubelet on the VM contains:

--pod-infra-container-image=mcr.microsoft.com/oss/kubernetes/pause:3.6

Steps to Reproduce the Problem

  1. Upgrade an AKS cluster to Kubernetes 1.35.0.
  2. Install Azure Karpenter provider v1.7.1 or v1.7.2.
  3. Create a NodePool and AKSNodeClass.
  4. Trigger provisioning of a new Karpenter-managed node.
  5. Observe that the VM launches, but no Kubernetes Node object is ever created.

Resource Specs and Logs

NodeClaim symptoms

The failing NodeClaim reaches launch but never registers:

Type:                  Launched
Status:                True
Reason:                Launched

Type:                  Registered
Status:                Unknown
Reason:                NodeNotFound
Message:               Node not registered with cluster

VM-side evidence

We inspected the VM directly with az vm run-command invoke.

/etc/default/kubelet included:

KUBELET_FLAGS=... --pod-infra-container-image=mcr.microsoft.com/oss/kubernetes/pause:3.6 ...

Recent kubelet log output on the failing VM:

E0314 18:19:08.262960   12399 run.go:72] "command failed" err="failed to parse kubelet flag: unknown flag: --pod-infra-container-image"
systemd[1]: kubelet.service: Failed with result 'exit-code'.

Source code reference

The provider still injects this flag from the bootstrap defaults here:

pkg/providers/imagefamily/bootstrap/staticvalues.go

"--pod-infra-container-image": "mcr.microsoft.com/oss/kubernetes/pause:3.6",

This reproduces on at least:

  • imageFamily: AzureLinux
  • imageFamily: Ubuntu

So this does not appear to be AzureLinux-specific; it appears to be a kubelet 1.35 bootstrap incompatibility in the provider bootstrap path.

Notes

This is distinct from NodeOverlay / CRD issues because:

  • the VM does launch successfully
  • Azure reports the VM as provisioned and running
  • the failure happens during kubelet startup on the guest

Community Note

  • Please vote on this issue by adding a 👍 reaction to help prioritize it.
  • Please do not leave "+1" or "me too" comments.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions