Support Koordinator as one batch scheduler option#2572
Support Koordinator as one batch scheduler option#2572kingeasternsun wants to merge 4 commits intoray-project:masterfrom
Conversation
|
Is this PR ready for review? The PR title is still marked as "draft." |
|
@kingeasternsun can you please add some tests before marking this ready for review? See other implementation for batch schedulers for reference |
Signed-off-by: kingeasternsun <kingeasternsun@gmai.com>
Thanks for your comment, It's still developing, and now it lacks test code. |
Thanks for your advice, I'll add the tests quickly |
Signed-off-by: kingeasternsun <kingeasternsun@gmai.com>
Hey, everything is fine now. |
Hi, tests had been added . |
ray-operator/controllers/ray/batchscheduler/koordinator/koordinator_gang_groups.go
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| for i, workerGroupSepc := range app.Spec.WorkerGroupSpecs { | ||
| gangGroups[1+i] = generateGangGroupName(app, workerGroupSepc.Template.Namespace, workerGroupSepc.GroupName) |
There was a problem hiding this comment.
nit: gangGroups[i+1]
Thank you for your review! I will fix it as soon as possible.
ray-operator/controllers/ray/batchscheduler/koordinator/koordinator_gang_groups.go
Outdated
Show resolved
Hide resolved
| gangGroups[1+i] = generateGangGroupName(app, workerGroupSepc.Template.Namespace, workerGroupSepc.GroupName) | ||
| minMemberMap[workerGroupSepc.GroupName] = wokerGroupReplicas{ | ||
| Replicas: *(workerGroupSepc.Replicas), | ||
| MinReplicas: *(workerGroupSepc.MinReplicas), |
There was a problem hiding this comment.
nit: the brackets are not needed here
There was a problem hiding this comment.
nit: the brackets are not needed here
Thank you for your review! I will fix it as soon as possible.
ray-operator/controllers/ray/batchscheduler/koordinator/koordinator_scheduler.go
Outdated
Show resolved
Hide resolved
| }, | ||
| ) | ||
|
|
||
| setHeadPodNamespace(rayClusterWithGangScheduling, "ns0") |
There was a problem hiding this comment.
Why is this call needed? The namespace should be inherited from the RayCluster namespace right?
There was a problem hiding this comment.
Why is this call needed? The namespace should be inherited from the RayCluster namespace right?
What you said is absolutely correct. However, to make this module's code more general, I considered that there might be scenarios in the future where the namespace for head pods or worker pods needs to be specifically designated. Therefore, I implemented compatibility here: if the namespace for the head pod or worker pod is empty, it will inherit the namespace of the rayCluster; otherwise, it will be processed according to the specified namespace.
Of course, if you think we don’t need to consider such special cases for now, I completely agree as well.
ray-operator/controllers/ray/batchscheduler/koordinator/koordinator_scheduler_test.go
Outdated
Show resolved
Hide resolved
|
Can you add an example YAML similar to this one https://github.com/ray-project/kuberay/blob/e9f31556c14fae6391fb27a4a96bfbe01f917d46/ray-operator/config/samples/ray-cluster.yunikorn-scheduler.yaml? |
|
Please update main.go Lines 96 to 97 in e9f3155 |
|
Please update batch schedulers in helm chart kuberay/helm-chart/kuberay-operator/values.yaml Lines 60 to 87 in e9f3155 |
Thank you for your review! I will add it as soon as possible. |
Thank you for your review! I will fix it as soon as possible. |
Thank you for your review! I will fix it as soon as possible. |
Signed-off-by: kingeasternsun <kingeasternsun@gmai.com>
Why are these changes needed?
Koordinator is a QoS-based scheduling for efficient orchestration of microservices, AI, and big data workloads on Kubernetes. It aims to improve the runtime efficiency and reliability of both latency sensitive workloads and batch jobs, simplify the complexity of resource-related configuration tuning, and increase pod deployment density to improve resource utilizations.
The integration is easy, Koordinator support annotation way to support gang scheduling without podgroup CR
Koordinator are compatible with
pod-group.scheduling.sigs.k8s.io,pod-group.scheduling.sigs.k8s.io/nameandpod-group.scheduling.sigs.k8s.io/min-availablein community.https://koordinator.sh/docs/designs/gang-scheduling
Related issue number
#2573
Checks