Skip to content

Commit 29aaa69

Browse files
committed
KEP-5007: Update the DRA doc for DRADeviceBindingConditions
1 parent bfcdaac commit 29aaa69

File tree

2 files changed

+97
-0
lines changed

2 files changed

+97
-0
lines changed

content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -626,6 +626,89 @@ spec:
626626
effect: NoExecute
627627
```
628628

629+
### Device Binding Conditions {#device-binding-conditions}
630+
631+
{{< feature-state feature_gate_name="DRADeviceBindingConditions" >}}
632+
633+
Device Binding Conditions allow the Kubernetes scheduler to delay Pod binding until
634+
external resources—such as fabric-attached GPUs or reprogrammable FPGAs—are confirmed
635+
to be ready.
636+
637+
This waiting behavior is implemented in the
638+
[PreBind phase](/docs/concepts/scheduling-eviction/scheduling-framework/#pre-bind)
639+
of the scheduling framework.
640+
During this phase, the scheduler checks whether all required device conditions are
641+
satisfied before proceeding with binding.
642+
643+
This improves scheduling reliability by avoiding premature binding and enables coordination
644+
with external device controllers.
645+
646+
To use this feature, device drivers (typically managed by driver owners) must publish the
647+
following fields in the `Device` section of a `ResourceSlice`. Cluster administrators
648+
must enable the `DRADeviceBindingConditions` and `DRAResourceClaimDeviceStatus` feature
649+
gates for the scheduler to honor these fields.
650+
651+
- `bindingConditions`: a list of condition keys that must have status `True` before binding.
652+
This indicate readiness signals such as "device attached" or "initialized".
653+
- `bindingFailureConditions`: a list of failure condition keys. If any have status `True`,
654+
indicate that binding should be aborted and the Pod rescheduled.
655+
- `bindsToNode`: if set to `true`, the scheduler records the selected node name in the
656+
`status.allocation.nodeSelector` field of the ResourceClaim.
657+
This does not affect the Pod’s `spec.nodeSelector`. Instead, it sets a node selector
658+
inside the ResourceClaim, which external controllers can use to perform node-specific
659+
operations such as device attachment or preparation.
660+
661+
These conditions are evaluated from the `status.conditions` field of the ResourceClaim.
662+
External controllers are responsible for updating these conditions using standard Kubernetes
663+
condition semantics (`type`, `status`, `reason`, `message`, `lastTransitionTime`).
664+
665+
The scheduler waits up to **600 seconds** for all `bindingConditions` to become `True`.
666+
If the timeout is reached or any `bindingFailureConditions` are `True`, the scheduler
667+
clears the allocation and reschedules the Pod.
668+
669+
#### Example ResourceSlice
670+
671+
```yaml
672+
apiVersion: resource.k8s.io/v1beta2
673+
kind: ResourceSlice
674+
metadata:
675+
name: gpu-slice
676+
spec:
677+
driver: dra.example.com
678+
nodeSelector:
679+
accelerator-type: high-performance
680+
pool:
681+
name: gpu-pool
682+
generation: 1
683+
resourceSliceCount: 1
684+
devices:
685+
- name: gpu-1
686+
attributes:
687+
vendor:
688+
string: "example"
689+
model:
690+
string: "example-gpu"
691+
bindsToNode: true
692+
bindingConditions:
693+
- dra.example.com/is-prepared
694+
bindingFailureConditions:
695+
- dra.example.com/preparing-failed
696+
```
697+
In this example:
698+
699+
- The ResourceSlice targets nodes labeled with accelerator-type=high-performance,
700+
allowing the scheduler to choose from a group of eligible nodes.
701+
- The scheduler selects one node from this group (e.g., node-3) and sets
702+
ResourceClaim.status.allocation.nodeSelector to that node name.
703+
- The device gpu-1 must be prepared before binding (is-prepared must have status True).
704+
- If preparation fails (preparing-failed has status True), the scheduler aborts binding.
705+
- The scheduler waits up to 600 seconds for the device to become ready.
706+
- External controllers can use the node selector in the ResourceClaim to perform
707+
node-specific setup on the selected node.
708+
709+
This feature is useful for asynchronous device preparation workflows,
710+
such as dynamic GPU attachment or FPGA initialization.
711+
629712
## {{% heading "whatsnext" %}}
630713

631714
- [Set Up DRA in a Cluster](/docs/tasks/configure-pod-container/assign-resources/set-up-dra-cluster/)
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
---
2+
title: DRADeviceBindingConditions
3+
content_type: feature_gate
4+
_build:
5+
list: never
6+
render: false
7+
8+
stages:
9+
- stage: alpha
10+
defaultValue: false
11+
fromVersion: "1.34"
12+
---
13+
Enables support for DeviceBindingConditions in the DRA related fields.
14+
This allows for thorough device readiness checks and attachment processes before Bind phase.

0 commit comments

Comments
 (0)