Discussion for Adding a Batch Scheduling Sub-Landscape #3761
Description
As part of the CNCF Batch Working Group (part of the TAG Runtime), we'd like to discuss adding a sub landscape focused on Batch Scheduling similar to the wasm sub landscape.
Example Draft
To illustrate what we were hoping to do, we worked up an example Batch Scheduling landscape here:
Please note that this is merely a rough draft of what a Batch Scheduling landscape could look like. We anticipate more projects will be added as we socialize this landscape throughout the community.
If this discussion would be better in a PR, we'd be happy to submit the changes that would be necessary and we can have the discussion there.
Rationale
The conversation around Batch Schedulers in the context of cloud and Kubernetes has been a complicated one over the last couple of years. As AI/ML continues to dominate discussions, the desire for solutions in this space has amplified. However, we find that people who want to solve this particular challenge often don't know where to start and don't know that there are existing options available.
As a result, companies often create their own bespoke solutions. Just about every KubeCon, another company announces that they are planning to open-source their new Batch Scheduler, often with extremely similar properties to the existing solutions. We'd much prefer to guide people to join forces on the existing solutions, ideally contributing to the conversations ongoing in the Kubernetes Batch Working Group (a sister working group the CNCF group working on k8s-specific issues) around Kueue and improving the core of Kubernetes to be more Batch Scheduling-friendly.
We think adding a landscape for Batch Scheduling could help bring awareness to the community that potential solutions already exist and that they have a place to start from.
We don't intend for the landscape to answer every question people have about Batch Scheduling on Kubernetes. Much like the vast CNCF landscape itself, it will be a starting point for people to work from and do their own diligence on what will work for them.
We don't relish bringing more complexity to an already overwhelming array of options on the existing landscape (and we really appreciate the recent improvements and simplifications in the recent update). However, there did not seem to be any meaningful way of describing the current landscape of Batch Schedulers within the context of the larger landscape. We are open to ideas, of course, which is why we're reaching out for discussion.