Skip to content

Commit 4ddbd18

Browse files
committed
KEP-5007: Update the DRA doc for DRADeviceBindingConditions
1 parent bfcdaac commit 4ddbd18

File tree

2 files changed

+97
-0
lines changed

2 files changed

+97
-0
lines changed

content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -626,6 +626,89 @@ spec:
626626
effect: NoExecute
627627
```
628628

629+
### Device Binding Conditions {#device-binding-conditions}
630+
631+
{{< feature-state feature_gate_name="DRADeviceBindingConditions" >}}
632+
633+
Device Binding Conditions allow the Kubernetes scheduler to delay Pod binding until
634+
external resources, such as fabric-attached GPUs or reprogrammable FPGAs, are confirmed
635+
to be ready.
636+
637+
This waiting behavior is implemented in the
638+
[PreBind phase](/docs/concepts/scheduling-eviction/scheduling-framework/#pre-bind)
639+
of the scheduling framework.
640+
During this phase, the scheduler checks whether all required device conditions are
641+
satisfied before proceeding with binding.
642+
643+
This improves scheduling reliability by avoiding premature binding and enables coordination
644+
with external device controllers.
645+
646+
To use this feature, device drivers (typically managed by driver owners) must publish the
647+
following fields in the `Device` section of a `ResourceSlice`. Cluster administrators
648+
must enable the `DRADeviceBindingConditions` and `DRAResourceClaimDeviceStatus` feature
649+
gates for the scheduler to honor these fields.
650+
651+
- `bindingConditions`: A list of condition types that must be set to True in the
652+
status.conditions field of the associated ResourceClaim before the Pod can be bound.
653+
These typically represent readiness signals such as "DeviceAttached" or "DeviceInitialized".
654+
- `bindingFailureConditions`: A list of condition types that, if set to True in
655+
status.conditions field of the associated ResourceClaim, indicate a failure state.
656+
If any of these conditions are True, the scheduler will abort binding and reschedule the Pod.
657+
- `bindsToNode`: if set to `true`, the scheduler records the selected node name in the
658+
`status.allocation.nodeSelector` field of the ResourceClaim.
659+
This does not affect the Pod's `spec.nodeSelector`. Instead, it sets a node selector
660+
inside the ResourceClaim, which external controllers can use to perform node-specific
661+
operations such as device attachment or preparation.
662+
663+
All condition types listed in bindingConditions and bindingFailureConditions are evaluated
664+
from the `status.conditions` field of the ResourceClaim.
665+
External controllers are responsible for updating these conditions using standard Kubernetes
666+
condition semantics (`type`, `status`, `reason`, `message`, `lastTransitionTime`).
667+
668+
The scheduler waits up to **600 seconds** for all `bindingConditions` to become `True`.
669+
If the timeout is reached or any `bindingFailureConditions` are `True`, the scheduler
670+
clears the allocation and reschedules the Pod.
671+
672+
673+
```yaml
674+
apiVersion: resource.k8s.io/v1
675+
kind: ResourceSlice
676+
metadata:
677+
name: gpu-slice
678+
spec:
679+
driver: dra.example.com
680+
nodeSelector:
681+
accelerator-type: high-performance
682+
pool:
683+
name: gpu-pool
684+
generation: 1
685+
resourceSliceCount: 1
686+
devices:
687+
- name: gpu-1
688+
attributes:
689+
vendor:
690+
string: "example"
691+
model:
692+
string: "example-gpu"
693+
bindsToNode: true
694+
bindingConditions:
695+
- dra.example.com/is-prepared
696+
bindingFailureConditions:
697+
- dra.example.com/preparing-failed
698+
```
699+
This example ResourceSlice has the following properties:
700+
701+
- The ResourceSlice targets nodes labeled with `accelerator-type=high-performance`,
702+
so that the scheduler uses only a specific set of eligible nodes.
703+
- The scheduler selects one node from the selected group (for example, `node-3`) and sets
704+
the `status.allocation.nodeSelector` field in the ResourceClaim to that node name.
705+
- The `dra.example.com/is-prepared` binding condition indicates that the device `gpu-1`
706+
must be prepared (the `is-prepared` condition has a status of `True`) before binding.
707+
- If the `gpu-1` device preparation fails (the `preparing-failed` condition has a status of `True`), the scheduler aborts binding.
708+
- The scheduler waits up to 600 seconds for the device to become ready.
709+
- External controllers can use the node selector in the ResourceClaim to perform
710+
node-specific setup on the selected node.
711+
629712
## {{% heading "whatsnext" %}}
630713

631714
- [Set Up DRA in a Cluster](/docs/tasks/configure-pod-container/assign-resources/set-up-dra-cluster/)
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
---
2+
title: DRADeviceBindingConditions
3+
content_type: feature_gate
4+
_build:
5+
list: never
6+
render: false
7+
8+
stages:
9+
- stage: alpha
10+
defaultValue: false
11+
fromVersion: "1.34"
12+
---
13+
Enables support for DeviceBindingConditions in the DRA related fields.
14+
This allows for thorough device readiness checks and attachment processes before Bind phase.

0 commit comments

Comments
 (0)