|
| 1 | +# Capacity Block Support |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +In v1.3.0 Karpenter introduced formal support for on-demand capacity reservations. |
| 6 | +However, this did not include a subset of ODCRs: Capacity Blocks. |
| 7 | +Capacity Blocks enable users to “reserve highly sought-after GPU instances on a future date to support short duration ML workloads”. |
| 8 | +This doc will focus on the extension to Karpenter’s existing ODCR feature to support Capacity Blocks. |
| 9 | + |
| 10 | +## Goals |
| 11 | + |
| 12 | +- Karpenter should enable users to select against Capacity Blocks when scheduling workloads |
| 13 | +- Karpenter should discover Capacity Blocks through `ec2nodeclass.spec.capacityReservationSelectorTerms` |
| 14 | +- Karpenter should gracefully handle Capacity Block expiration |
| 15 | + |
| 16 | +## API Updates |
| 17 | + |
| 18 | +We will add the `karpenter.k8s.aws/capacity-reservation-type` label, which can take on the values `default` and `capacity-block`. |
| 19 | +This mirrors the `reservationType` field in the `ec2:DescribeCapacityReservation` response and will enable users to select on capacity block nodes via NodePool requirements or node selector terms. |
| 20 | + |
| 21 | +```yaml |
| 22 | +# Configure a NodePool to only be compatible with instance types with active |
| 23 | +# capacity block reservations |
| 24 | +kind: NodePool |
| 25 | +apiVersion: karpenter.sh/v1 |
| 26 | +spec: |
| 27 | + template: |
| 28 | + spec: |
| 29 | + requirements: |
| 30 | + - key: karpenter.k8s.aws/capacity-reservation-type |
| 31 | + operator: In |
| 32 | + values: ['capacity-block'] |
| 33 | +--- |
| 34 | +# Configure a pod to only schedule against nodes backed by capacity blocks |
| 35 | +kind: Pod |
| 36 | +apiVersion: v1 |
| 37 | +spec: |
| 38 | + nodeSelector: |
| 39 | + karpenter.k8s.aws/capacity-reservation-type: capacity-block |
| 40 | +``` |
| 41 | +
|
| 42 | +Additionally, we will update the NodeClass status to reflect the reservation type and state for a given capacity reservation: |
| 43 | +
|
| 44 | +```yaml |
| 45 | +kind: EC2NodeClass |
| 46 | +apiVersion: karpenter.k8s.aws/v1 |
| 47 | +status: |
| 48 | + capacityReservations: |
| 49 | + - # ... |
| 50 | + reservationType: Enum (default | capacity-block) |
| 51 | + state: Enum (active | expiring) |
| 52 | +``` |
| 53 | +
|
| 54 | +No changes are required for `ec2nodeclass.spec.capacityReservationSelectorTerms`. |
| 55 | + |
| 56 | +## Launch Behavior |
| 57 | + |
| 58 | +Today, when Karpenter creates a NodeClaim targeting reserved capacity, it ensures it is launched into one of the correct reservations by injecting a `karpenter.k8s.aws/capacity-reservation-id` requirement into the NodeClaim. |
| 59 | +By injecting this requirement, we ensure Karpenter can maximize flexibility sent to CreateFleet (minimizing risk of ReservedCapacityExceeded errors) while also ensuring Karpenter doesn’t overlaunch into any given reservation. |
| 60 | + |
| 61 | +```yaml |
| 62 | +kind: NodeClaim |
| 63 | +apiVersion: karpenter.sh/v1 |
| 64 | +spec: |
| 65 | + requirements: |
| 66 | + - key: karpenter.k8s.aws/capacity-reservation-id |
| 67 | + operator: In |
| 68 | + values: ['cr-foo', 'cr-bar'] |
| 69 | + # ... |
| 70 | +``` |
| 71 | + |
| 72 | +Given the NodeClaim spec above, Karpenter will create launch templates for both `cr-foo` and `cr-bar`, providing both in the CreateFleet request. |
| 73 | +However, this breaks down when we begin to mix default and capacity-block ODCRs (e.g. `cr-foo` is a default capacity reservation, and `cr-bar` is a capacity-block). |
| 74 | +This is because the `TargetCapacitySpecificationRequest.DefaultTargetCapacityType` field in the CreateFleet request needs to be set to on-demand or capacity-block, preventing us from mixing them in a single request. |
| 75 | +Instead, if a NodeClaim is compatible with both types of ODCRs, we must choose a subset of those ODCRs to include in the CreateFleet request. |
| 76 | +We have the following options for prioritization when making this selection: |
| 77 | + |
| 78 | +- Prioritize price (the subset with the “cheapest” offering) |
| 79 | +- Prioritize flexibility (the subset with the greatest number of offerings) |
| 80 | + |
| 81 | +Although prioritizing flexibility is desireable to reduce the risk of ReservedCapacityExceeded errors, it won’t interact well with consolidation and result in additional node churn. |
| 82 | +For that reason, we should prioritize the set of ODCRs with the “cheapest” offering when generating the CreateFleet request. |
| 83 | +If there is a tie between a default and capacity-block offering, we will prioritize the capacity-block offering. |
| 84 | + |
| 85 | +## Interruption |
| 86 | + |
| 87 | +Although capacity blocks are modeled as ODCRs, their expiration behavior differs. |
| 88 | +Any capacity still in use when a default ODCR expires falls back to a standard on-demand instance. |
| 89 | +On the other hand, instances in use from a capacity block reservation are terminated ahead of their end date. |
| 90 | + |
| 91 | +From the [EC2 documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-blocks.html): |
| 92 | + |
| 93 | +> You can use all the instances you reserved until 30 minutes before the end time of the Capacity Block. |
| 94 | +> With 30 minutes left in your Capacity Block reservation, we begin terminating any instances that are running in the Capacity Block. |
| 95 | +> We use this time to clean up your instances before delivering the Capacity Block to the next customer. |
| 96 | +> We emit an event through EventBridge 10 minutes before the termination process begins. |
| 97 | +> For more information, see Monitor Capacity Blocks using EventBridge. |
| 98 | + |
| 99 | +Karpenter should gracefully handle this interruption by draining the nodes ahead of termination. |
| 100 | +While we could integrate with the EventBridge event referenced above, this introduces complications when rehydrating state after a controller restart. |
| 101 | +Instead, we will rely on the fact that interruption occurs at a fixed time relative to the end date of the capacity reservation, which is already discovered via `ec2:DescribeCapacityReservation`. |
| 102 | +Matching the time the expiration warning event is emmitted, Karpenter will begin to drain the node 10 minutes before EC2 begins reclaiming the capacity (40 minutes before the end date). |
| 103 | +Once the reclaimation period begins, Karpenter will mark the capacity reservation as expiring in the EC2NodeClass' status. |
0 commit comments