Description
Description
Current State
Karpenter now supports karpenter.sh/capacity-type: reserved
which required several changes to the scheduler. The most important change is the additional of a ReservedOfferingMode
which dictates if Karpenter should fallback to spot / on-demand if it can't create a reserved NodeClaim despite available capacity.
Why is this necessary? The core issue is that when Karpenter creates a set of NodeClaims, it doesn't know which instance type the cloudprovider implementation will select. This impacts how reserved capacity is tracked. Since we don't know which offering a NodeClaim will be launched into, we reserve capacity for all compatible reserved offerings. On one hand, this ensures that the cloudprovider doesn't overlaunch into any given capacity reservation, but on the other hand it means we underestimate the number of reserved NodeClaims we can create in a given scheduling simulation.
How is this solved today? The provisioner uses ReservedOfferingModeStrict
which prevents fallback to on-demand or spot if there are available compatible reserved offerings, but we couldn't make a reservation due to them being reserved by previously created NodeClaims. This will result in some pods being ignored for a given scheduling simulation, with the assumption that they will schedule in subsequent simulations. Without this behavior Karpenter would fallback, launch into an OD or spot instance, and then consolidate into reserved shortly after. The tradeoff made here was to potentially increase launch latency, but reduce node churn.
While this works for provisioning, it is inherently incompatible with Drift. Once a node is drifted, Karpenter must provision replacement capacity before proceeding to drain and terminate the node. The replacement capacity must be created in a single scheduling simulation. By preventing fallback for drift, we may deadlock if multiple simulations are required. To avoid deadlocks, drift is allowed to fallback.
Potential Solution
Each of the reserved offering modes have their drawbacks, but are the best we can do without selecting an instance before creating the NodeClaim. We don't do this since we want to maximize flexibility for two reasons:
- Reduce chances of ICEs, minimizing node launch latency
- Increase flexibility for spot
A solution which would allow us to remove the reserved offering modes altogether must have the following properties:
- Ensure we don't underestimate the number of reserved NodeClaims we can create
- Maximize the flexibility of any individual nodeclaim
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Activity