@@ -626,6 +626,89 @@ spec:
626626 effect: NoExecute
627627` ` `
628628
629+ # ## Device Binding Conditions {#device-binding-conditions}
630+
631+ {{< feature-state feature_gate_name="DRADeviceBindingConditions" >}}
632+
633+ Device Binding Conditions allow the Kubernetes scheduler to delay Pod binding until
634+ external resources, such as fabric-attached GPUs or reprogrammable FPGAs, are confirmed
635+ to be ready.
636+
637+ This waiting behavior is implemented in the
638+ [PreBind phase](/docs/concepts/scheduling-eviction/scheduling-framework/#pre-bind)
639+ of the scheduling framework.
640+ During this phase, the scheduler checks whether all required device conditions are
641+ satisfied before proceeding with binding.
642+
643+ This improves scheduling reliability by avoiding premature binding and enables coordination
644+ with external device controllers.
645+
646+ To use this feature, device drivers (typically managed by driver owners) must publish the
647+ following fields in the `Device` section of a `ResourceSlice`. Cluster administrators
648+ must enable the `DRADeviceBindingConditions` and `DRAResourceClaimDeviceStatus` feature
649+ gates for the scheduler to honor these fields.
650+
651+ - `bindingConditions` : A list of condition types that must be set to True in the
652+ status.conditions field of the associated ResourceClaim before the Pod can be bound.
653+ These typically represent readiness signals such as "DeviceAttached" or "DeviceInitialized".
654+ - `bindingFailureConditions` : A list of condition types that, if set to True in
655+ status.conditions field of the associated ResourceClaim, indicate a failure state.
656+ If any of these conditions are True, the scheduler will abort binding and reschedule the Pod.
657+ - `bindsToNode` : if set to `true`, the scheduler records the selected node name in the
658+ ` status.allocation.nodeSelector` field of the ResourceClaim.
659+ This does not affect the Pod's `spec.nodeSelector`. Instead, it sets a node selector
660+ inside the ResourceClaim, which external controllers can use to perform node-specific
661+ operations such as device attachment or preparation.
662+
663+ All condition types listed in bindingConditions and bindingFailureConditions are evaluated
664+ from the `status.conditions` field of the ResourceClaim.
665+ External controllers are responsible for updating these conditions using standard Kubernetes
666+ condition semantics (`type`, `status`, `reason`, `message`, `lastTransitionTime`).
667+
668+ The scheduler waits up to **600 seconds** for all `bindingConditions` to become `True`.
669+ If the timeout is reached or any `bindingFailureConditions` are `True`, the scheduler
670+ clears the allocation and reschedules the Pod.
671+
672+
673+ ` ` ` yaml
674+ apiVersion: resource.k8s.io/v1
675+ kind: ResourceSlice
676+ metadata:
677+ name: gpu-slice
678+ spec:
679+ driver: dra.example.com
680+ nodeSelector:
681+ accelerator-type: high-performance
682+ pool:
683+ name: gpu-pool
684+ generation: 1
685+ resourceSliceCount: 1
686+ devices:
687+ - name: gpu-1
688+ attributes:
689+ vendor:
690+ string: "example"
691+ model:
692+ string: "example-gpu"
693+ bindsToNode: true
694+ bindingConditions:
695+ - dra.example.com/is-prepared
696+ bindingFailureConditions:
697+ - dra.example.com/preparing-failed
698+ ` ` `
699+ This example ResourceSlice has the following properties :
700+
701+ - The ResourceSlice targets nodes labeled with `accelerator-type=high-performance`,
702+ so that the scheduler uses only a specific set of eligible nodes.
703+ - The scheduler selects one node from the selected group (for example, `node-3`) and sets
704+ the `status.allocation.nodeSelector` field in the ResourceClaim to that node name.
705+ - The `dra.example.com/is-prepared` binding condition indicates that the device `gpu-1`
706+ must be prepared (the `is-prepared` condition has a status of `True`) before binding.
707+ - If the `gpu-1` device preparation fails (the `preparing-failed` condition has a status of `True`), the scheduler aborts binding.
708+ - The scheduler waits up to 600 seconds for the device to become ready.
709+ - External controllers can use the node selector in the ResourceClaim to perform
710+ node-specific setup on the selected node.
711+
629712# # {{% heading "whatsnext" %}}
630713
631714- [Set Up DRA in a Cluster](/docs/tasks/configure-pod-container/assign-resources/set-up-dra-cluster/)
0 commit comments