@@ -626,6 +626,89 @@ spec:
626626 effect: NoExecute
627627` ` `
628628
629+ # ## Device Binding Conditions {#device-binding-conditions}
630+
631+ {{< feature-state feature_gate_name="DRADeviceBindingConditions" >}}
632+
633+ Device Binding Conditions allow the Kubernetes scheduler to delay Pod binding until
634+ external resources—such as fabric-attached GPUs or reprogrammable FPGAs—are confirmed
635+ to be ready.
636+
637+ This waiting behavior is implemented in the
638+ [PreBind phase](/docs/concepts/scheduling-eviction/scheduling-framework/#pre-bind)
639+ of the scheduling framework.
640+ During this phase, the scheduler checks whether all required device conditions are
641+ satisfied before proceeding with binding.
642+
643+ This improves scheduling reliability by avoiding premature binding and enables coordination
644+ with external device controllers.
645+
646+ To use this feature, device drivers (typically managed by driver owners) must publish the
647+ following fields in the `Device` section of a `ResourceSlice`. Cluster administrators
648+ must enable the `DRADeviceBindingConditions` and `DRAResourceClaimDeviceStatus` feature
649+ gates for the scheduler to honor these fields.
650+
651+ - `bindingConditions` : a list of condition keys that must have status `True` before binding.
652+ This indicate readiness signals such as "device attached" or "initialized".
653+ - `bindingFailureConditions` : a list of failure condition keys. If any have status `True`,
654+ indicate that binding should be aborted and the Pod rescheduled.
655+ - `bindsToNode` : if set to `true`, the scheduler records the selected node name in the
656+ ` status.allocation.nodeSelector` field of the ResourceClaim.
657+ This does not affect the Pod’s `spec.nodeSelector`. Instead, it sets a node selector
658+ inside the ResourceClaim, which external controllers can use to perform node-specific
659+ operations such as device attachment or preparation.
660+
661+ These conditions are evaluated from the `status.conditions` field of the ResourceClaim.
662+ External controllers are responsible for updating these conditions using standard Kubernetes
663+ condition semantics (`type`, `status`, `reason`, `message`, `lastTransitionTime`).
664+
665+ The scheduler waits up to **600 seconds** for all `bindingConditions` to become `True`.
666+ If the timeout is reached or any `bindingFailureConditions` are `True`, the scheduler
667+ clears the allocation and reschedules the Pod.
668+
669+ # ### Example ResourceSlice
670+
671+ ` ` ` yaml
672+ apiVersion: resource.k8s.io/v1beta2
673+ kind: ResourceSlice
674+ metadata:
675+ name: gpu-slice
676+ spec:
677+ driver: dra.example.com
678+ nodeSelector:
679+ accelerator-type: high-performance
680+ pool:
681+ name: gpu-pool
682+ generation: 1
683+ resourceSliceCount: 1
684+ devices:
685+ - name: gpu-1
686+ attributes:
687+ vendor:
688+ string: "example"
689+ model:
690+ string: "example-gpu"
691+ bindsToNode: true
692+ bindingConditions:
693+ - dra.example.com/is-prepared
694+ bindingFailureConditions:
695+ - dra.example.com/preparing-failed
696+ ` ` `
697+ In this example :
698+
699+ - The ResourceSlice targets nodes labeled with accelerator-type=high-performance,
700+ allowing the scheduler to choose from a group of eligible nodes.
701+ - The scheduler selects one node from this group (e.g., node-3) and sets
702+ ResourceClaim.status.allocation.nodeSelector to that node name.
703+ - The device gpu-1 must be prepared before binding (is-prepared must have status True).
704+ - If preparation fails (preparing-failed has status True), the scheduler aborts binding.
705+ - The scheduler waits up to 600 seconds for the device to become ready.
706+ - External controllers can use the node selector in the ResourceClaim to perform
707+ node-specific setup on the selected node.
708+
709+ This feature is useful for asynchronous device preparation workflows,
710+ such as dynamic GPU attachment or FPGA initialization.
711+
629712# # {{% heading "whatsnext" %}}
630713
631714- [Set Up DRA in a Cluster](/docs/tasks/configure-pod-container/assign-resources/set-up-dra-cluster/)
0 commit comments