Skip to content

NodeOverlay Capacity additions cause NodeClaim failure #1443

@marchermans

Description

@marchermans

Version

Karpenter Version: v1.7.1
Kubernetes Version: v1.33.6

Expected Behavior

Creating a NodeOverlay as follows:

apiVersion: karpenter.sh/v1alpha1
kind: NodeOverlay
metadata:
  annotations:
    meta.helm.sh/release-name: rls-cls-karpenter-nodepools-dev
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2026-02-20T07:41:12Z"
  generation: 5
  labels:
    app.kubernetes.io/managed-by: Helm
  name: default
  resourceVersion: "716056070"
  uid: 457f21e7-e001-4404-99fc-6021fea335b7
spec:
  capacity:
    nextflow.io/fuse: 100
  requirements: []

Or any other capacity change which purely adds a resource type using a daemon set, allows both the scheduling, creation and reconciliation of VMs, NodeClaims etc.

Actual Behavior

Creating a NodeOverlay as follows:

apiVersion: karpenter.sh/v1alpha1
kind: NodeOverlay
metadata:
  annotations:
    meta.helm.sh/release-name: rls-cls-karpenter-nodepools-dev
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2026-02-20T07:41:12Z"
  generation: 5
  labels:
    app.kubernetes.io/managed-by: Helm
  name: default
  resourceVersion: "716056070"
  uid: 457f21e7-e001-4404-99fc-6021fea335b7
spec:
  capacity:
    nextflow.io/fuse: 100
  requirements: []

Or any other capacity change which purely adds a resource type using a daemon set, only allows the scheduling of the Pod and the creation of the NodeClaim. When Karpenter goes to Reconcile the NodeClaim with a real VM it tries to resolve the instance type and finds no matching instance type to create a VM from for the NodeClaim. This is caused by the following logic in CloudProvider: https://github.com/Azure/karpenter-provider-azure/blob/main/pkg/cloudprovider/cloudprovider.go#L152-L158 The returned list from: resolveInstanceTypes is empty as no instance type has the required resource available, throwing the ICE and deleting the NodeClaim.

Steps to Reproduce the Problem

  1. Create a new AKS (Linux) instance with Karpenter enabled (both the NAP and BYOK variant experience this issue)
  2. Install a DaemonSet which provides a resource. In my testing I tried various different Fuse providers, all of them exposed the behavior. As for the reproduction I suggest using: https://github.com/nextflow-io/k8s-fuse-plugin, it is as good and bad as any other for this purpose. But any DaemonSet which provides a virtual resource should suffice.
  3. Create a node pool. Really any NodePool with linux amd64 nodes will do.
  4. Try to deploy a Pod which uses the Fuse resource, you will notice that Karpenter can not schedule the pod and prints a proper error message.
  5. Deploy the NodeOverlay mentioned above. The pod is now schedulable and will get scheduled by Karpenter. A NodeClaim gets created. Karpenter then immediately runs into the following error: {"level":"ERROR","time":"2026-02-20T13:02:18.403Z","logger":"controller","caller":"lifecycle/launch.go:85","message":"failed launching nodeclaim", "commit":"75f2081", "controller":"nodeclaim.lifecycle", "controllerGroup":"karpenter.sh", "controllerKind":"NodeClaim", "Node Claim":{"name":"karpenter-linux-hr47h"}, "namespace":"", "name":"karpenter-linux-hr47h","reconcileID":"c366e8d5-fbdf-4a88-9494-f3189b0d48a3","error":"insufficient capacity, all requested instance types were unavailable during launch"}

Resource Specs and Logs

Not exactly sure what exact specs you are looking for here, other then the ones provided above in the reproduction steps. But if you are looking for something specific then please let me know and I will provide it.

Log of the Karpenter controller:

Details

{"level":"DEBUG","time":"2026-02-20T13:02:18.232Z","logger":"controller","caller":"provisioning/provisioner.go:326","message":"computing scheduling decision for provisionable pod(s)","commit":"75f2081","controller":"provisioner","namespace":"","name":"","reconcileID":"97198591-85c7-4ec2-9504-3d79cb19173f","pending-pods":1,"deleting-pods":0}
{"level":"DEBUG","time":"2026-02-20T13:02:18.260Z","logger":"controller","caller":"scheduling/scheduler.go:608","message":"40 out of 458 instance types were excluded because they would breach limits","commit":"75f2081","controller":"provisioner","namespace":"","name":"","reconcileID":"97198591-85c7-4ec2-9504-3d79cb19173f","NodePool":{"name":"karpenter-linux"}}
{"level":"INFO","time":"2026-02-20T13:02:18.272Z","logger":"controller","caller":"provisioning/provisioner.go:384","message":"found provisionable pod(s)","commit":"75f2081","controller":"provisioner","namespace":"","name":"","reconcileID":"97198591-85c7-4ec2-9504-3d79cb19173f","Pods":"gitlab-runners/runner-pixn0mn5t-project-2505-concurrent-0-pgynl13x","duration":"42.258357ms"}
{"level":"INFO","time":"2026-02-20T13:02:18.272Z","logger":"controller","caller":"scheduling/scheduler.go:272","message":"computed new nodeclaim(s) to fit pod(s)","commit":"75f2081","controller":"provisioner","namespace":"","name":"","reconcileID":"97198591-85c7-4ec2-9504-3d79cb19173f","nodeclaims":1,"pods":1}
{"level":"INFO","time":"2026-02-20T13:02:18.308Z","logger":"controller","caller":"provisioning/provisioner.go:420","message":"created nodeclaim","commit":"75f2081","controller":"provisioner","namespace":"","name":"","reconcileID":"97198591-85c7-4ec2-9504-3d79cb19173f","NodePool":{"name":"karpenter-linux"},"NodeClaim":{"name":"karpenter-linux-hr47h"},"requests":{"cpu":"1589m","memory":"2332Mi","nextflow.io/fuse":"1","pods":"15"},"instance-types":"Standard_A2m_v2, Standard_A4_v2, Standard_A4m_v2, Standard_A8_v2, Standard_A8m_v2 and 404 other(s)"}
{"level":"DEBUG","time":"2026-02-20T13:02:18.308Z","logger":"controller","caller":"inplaceupdate/controller.go:87","message":"comparing in-place update hashes","commit":"75f2081","controller":"nodeclaim.inplaceupdate","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"karpenter-linux-hr47h"},"namespace":"","name":"karpenter-linux-hr47h","reconcileID":"0b9ad39d-10b8-497b-8d0f-20e4451204fe","goalHash":"3618194801","actualHash":""}
{"level":"DEBUG","time":"2026-02-20T13:02:18.308Z","logger":"controller","caller":"inplaceupdate/controller.go:135","message":"can't update yet as the claim is not registered","commit":"75f2081","controller":"nodeclaim.inplaceupdate","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"karpenter-linux-hr47h"},"namespace":"","name":"karpenter-linux-hr47h","reconcileID":"0b9ad39d-10b8-497b-8d0f-20e4451204fe"}
{"level":"ERROR","time":"2026-02-20T13:02:18.403Z","logger":"controller","caller":"lifecycle/launch.go:85","message":"failed launching nodeclaim","commit":"75f2081","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"karpenter-linux-hr47h"},"namespace":"","name":"karpenter-linux-hr47h","reconcileID":"c366e8d5-fbdf-4a88-9494-f3189b0d48a3","error":"insufficient capacity, all requested instance types were unavailable during launch"}
{"level":"DEBUG","time":"2026-02-20T13:02:18.417Z","logger":"controller","caller":"inplaceupdate/controller.go:87","message":"comparing in-place update hashes","commit":"75f2081","controller":"nodeclaim.inplaceupdate","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"karpenter-linux-hr47h"},"namespace":"","name":"karpenter-linux-hr47h","reconcileID":"f03c65a3-19c8-424c-a4a2-278491b5404c","goalHash":"3618194801","actualHash":""}
{"level":"INFO","time":"2026-02-20T13:02:19.530Z","logger":"controller","caller":"lifecycle/controller.go:298","message":"annotated nodeclaim","commit":"75f2081","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"karpenter-linux-hr47h"},"namespace":"","name":"karpenter-linux-hr47h","reconcileID":"ec286947-2f26-4fd8-89a7-2db3453c7a91","karpenter.sh/nodeclaim-termination-timestamp":"2026-02-20T13:02:48Z"}
{"level":"INFO","time":"2026-02-20T13:02:19.576Z","logger":"controller","caller":"lifecycle/controller.go:258","message":"deleted nodeclaim","commit":"75f2081","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"karpenter-linux-hr47h"},"namespace":"","name":"karpenter-linux-hr47h","reconcileID":"ec286947-2f26-4fd8-89a7-2db3453c7a91"}

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions