-
Notifications
You must be signed in to change notification settings - Fork 78
Description
Is your feature request related to a problem? Please describe.
The documentation for launching RayJobs shows the steps to first deploy a RayCluster with a given name (demo-slurm-ray). This spins up a RayCluster using the executor of choice. It then creates a RayJob with a new name (ie. demo-slurm-job). The RayJob code checks if there is a RayCluster with the same name as the RayJob and will run it in the existing cluster if so. Otherwise, if the cluster with the same name as the job doesn't exist, it will create a new ephemeral cluster specifically for that job.
There doesn't appear to be a way in the code to link a RayJob with an existing cluster, so users are forced to use the same name for their RayJob as the name of their RayCluster which isn't ideal as all jobs would have the same name.
Describe the solution you'd like
There should be a way to specify the existing RayCluster name if it doesn't already exist. For example, maybe there is a new cluster_name parameter to the RayJob base class which allows users to specify which cluster to try and connect to if it exists.
Describe alternatives you've considered
The RayJob could have another dependency on the RayCluster class so submitting jobs is one of the methods from within RayCluster, but that doesn't feel like a good solution.
Additional context
Add any other context or screenshots about the feature request here.