EC2NodeClass karpenter.k8s.aws/termination finalizer stuck indefinitely after deletion, even with zero NodeClaims

## Version

- Karpenter controller: `v1.8.1` (`public.ecr.aws/karpenter/controller:1.8.1@sha256:41c28a606cbad86869384ff8ae8345203b63f81612b6fcfd2e136197dccc03ef`)
- Helm chart: `karpenter-1.8.1`
- CRDs: `ec2nodeclasses.karpenter.k8s.aws/v1`, `nodepools.karpenter.sh/v1`
- Platform: EKS, `us-east-1`

## Symptom

When an `EC2NodeClass` is deleted (via `kubectl delete`, or implicitly via `helm uninstall` on a chart that owns it), the resource enters terminating state with `deletionTimestamp` set, but the `karpenter.k8s.aws/termination` finalizer is never released. The resource remains as a tombstone indefinitely.

Observed state on a stuck NodeClass:

```yaml
metadata:
  deletionTimestamp: "2026-05-21T05:27:03Z"
  finalizers:
  - karpenter.k8s.aws/termination
  generation: 3
status:
  conditions:
  - type: Ready
    status: "True"
    reason: Ready
  observedGeneration: <empty>   # Karpenter has stopped reconciling updates
```

Other observations:
- No NodeClaims reference the NodeClass at the time of (or after) deletion.
- No Nodes attributable to this NodeClass exist in the cluster.
- Subnets, security groups, and instance profile selectors are all valid and the NodeClass was `Ready=True` immediately before the delete.
- The NodePool that referenced this NodeClass was deleted normally (no finalizer issues).
- Karpenter controller logs at the time of the delete contain no errors related to this NodeClass; the controller appears to never attempt finalizer release.

Once stuck, the only way out is manually patching the finalizer off:

```bash
kubectl patch ec2nodeclass <name> --type=merge -p '{"metadata":{"finalizers":null}}'
```

## Reproduction (statistically — not 100% deterministic)

1. Install a Helm chart that creates an `EC2NodeClass` and a `NodePool` referencing it.
2. Let Karpenter provision a handful of nodes for the NodePool.
3. Run workloads on those nodes briefly.
4. `helm uninstall` the chart. Helm deletes pods → nodes drain → NodeClaims are removed → both the NodePool and the EC2NodeClass are sent `kubectl delete`.
5. Observe: NodePool deletes cleanly. EC2NodeClass sits with `deletionTimestamp` set, finalizer present, indefinitely.

Hit twice in our staging cluster during a deploy cutover, ~24 hours apart, on two different NodeClass instances (`llamacloud-helm-unified`, `llamacloud-parse-helm-unified`). On the first occurrence we also had an older orphaned NodeClass from April that had been stuck for 43 days under similar circumstances.

## Impact

A stuck tombstoned EC2NodeClass blocks subsequent attempts to manage a NodeClass with the same name. Helm cannot meaningfully update it (Karpenter ignores spec updates on objects pending deletion — `observedGeneration` is empty even though `generation` advances on each helm-upgrade). The NodePool reports `NodeClassReady=False` with reason `NodeClassTerminating`. Karpenter logs `"ignoring nodepool, not ready"` and refuses to provision. Workload pods sit Pending until an operator manually clears the finalizer.

## Workaround

Tagging the EC2NodeClass and NodePool with `helm.sh/resource-policy: keep` so they survive `helm uninstall`. Updates via `helm upgrade` continue to work; only the delete path is avoided. This sidesteps the bug but is unsatisfying — `kubectl delete` against a healthy NodeClass should still complete cleanly.

## What I think is happening

Speculation, not verified: the finalizer release logic appears to be conditional on some signal that doesn't reliably fire when a NodeClass has produced zero (or has already cleaned up all) NodeClaims. The release path may be triggered by NodeClaim deletion events; if the NodeClaims were drained and removed before the NodeClass was marked for deletion, there's no further event to drive finalizer cleanup. Just a guess — the maintainers will know better.

Happy to provide controller logs from the time of the stuck deletion, or to reproduce with a fresh cluster if useful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EC2NodeClass karpenter.k8s.aws/termination finalizer stuck indefinitely after deletion, even with zero NodeClaims #9185

Version

Symptom

Reproduction (statistically — not 100% deterministic)

Impact

Workaround

What I think is happening

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

EC2NodeClass karpenter.k8s.aws/termination finalizer stuck indefinitely after deletion, even with zero NodeClaims #9185

Description

Version

Symptom

Reproduction (statistically — not 100% deterministic)

Impact

Workaround

What I think is happening

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions