Roadmap

High-level overview of the main priorities for 2026

Support DRA for NVidia whole GPUs allocation
Support automatic sub-grouping of Ray, Pytorch and LWS
Block topology aware scheduling
Support scheduling placement strategy at the GPU level (currently supported at the node level)
Support K8S workload/pod-group API
Support Max Run Time per workload (with delayed requeue)
Max run time per queue (with delayed requeue)
Add metrics for pod and pod-group preemptions
User-level fairness
Support DRA for MIG devices
Support GPU compute sharing constraints
Support DRA for fractional GPU devices
Semi-preemptible workloads
Per queue multiple GPU types resource management

Add support for multi-cluster scheduling
Hyper scale improvements
Support Consolidation of Inference workloads for cluster defragmentation
Graceful rollout of Inference workloads (new revision update using queue temporary over-quota)
Support Hero Job
Support resource reservation and resources backfill
Support global priority scheme