Surface NodeClaim drift/rollout progress in NodePool status

### Description

**What problem are you trying to solve?**

Today, drift is only observable per-NodeClaim, via the `Drifted` status condition. There is no aggregate, NodePool-level signal that tells you how far a NodePool is through reconciling drift across the nodes it owns.

This makes it hard for external systems to gate on "this NodePool has finished rolling out a change" without listing and aggregating NodeClaims themselves. Concretely, we run Karpenter under Argo CD (GitOps). When a NodePool/EC2NodeClass change triggers drift, we want Argo CD to report the Application as `Progressing` until the drift-driven node replacement is substantially complete, and `Healthy` once it is.

Argo CD's health is evaluated **per resource** with a sandboxed Lua check that has no access to other resources. So a NodePool health check can only read the NodePool's own `status`. A NodeClaim health check could in principle read each NodeClaim's `Drifted` condition, but:

- It would require NodeClaims to appear as children of the NodePool in Argo's resource tree (which, in our multi-cluster setup, they currently do not), and
- It forces all-or-nothing (100%) semantics, with no notion of a tolerance threshold or a settling window.

As a result we maintain a PostSync Job that polls NodeClaims, groups them by `karpenter.sh/nodepool`, computes the percentage no longer `Drifted`, and blocks until each NodePool crosses a threshold (e.g. 90%). This is exactly the kind of aggregation we'd expect Karpenter itself to be able to expose, since it already owns the NodeClaims and tracks their drift state.

**Proposal**

Expose drift/rollout progress at the NodePool level, in `NodePool.status`. Any of the following would be sufficient for our use case (in rough order of preference):

1. Counts in `NodePool.status`, e.g. `status.driftedNodeClaims` / `status.nodeClaims` (analogous to the existing `status.nodes` and `status.resources`), so consumers can compute completion percentage directly.
2. A NodePool-level condition such as `Drifted` (`status: "True"` while any owned NodeClaim is drifted, `"False"` once reconciliation is complete), mirroring how the per-NodeClaim `Drifted` condition works.
3. Both — a condition for a simple boolean gate plus counts for threshold-based logic.

This would let any GitOps/automation system gate on a NodePool's own `status` without re-implementing NodeClaim aggregation, and it would make rollout progress visible in `kubectl get nodepool` and dashboards.

**How important is this feature to you?**

Moderately important. We have a working PostSync Job that does the aggregation today, so we are not blocked, but it adds operational surface (a Job + ServiceAccount + ClusterRole + a maintained image per cluster) purely to compute information Karpenter already has internally. A NodePool-level status field/condition would let us replace that machinery with a standard Argo CD custom health check and would benefit anyone integrating Karpenter rollouts with external orchestration.

* Please vote on this issue by adding a 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to help the community and maintainers prioritize this request
* Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
* If you are interested in working on this issue or have submitted a pull request, please leave a comment


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Surface NodeClaim drift/rollout progress in NodePool status #3071

Description

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Surface NodeClaim drift/rollout progress in NodePool status #3071

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions