-
Notifications
You must be signed in to change notification settings - Fork 151
Description
We currently only support in-cluster KubernetesJob, but there is great usefulness in having a client external to the cluster driving the job. This allows workflows such as:
- A jupyter notebook that can send messages to meshes on the cluster. Reloading single cells enables fast iteration speed while providing access to remote hardware at a larger scale
- User scripts running on local clients outside the cluster have greater flexibility in what code they can write
- Starting jobs from a separate cluster, for example from a web server, and being able to send commands into the other cluster
There is one major blocker to get this working: monarch clients rely on a way to directly communicate with all hosts in the mesh, either by IP address + port or a hostname + port. Kubernetes does not expose public static IP addresses or hostnames by default outside of the cluster. This task is about finding a way to bridge the communication gap.
A second subtask is how to efficiently get the code from the local client onto the remote cluster. We would also like if we can get the user's installed PyPI packages synced as well, so users can quickly try things that require new packages. For an initial implementation the existing HostMesh.sync_workspace will be fine, but we'll need to enhance that for scale, reliability, and connectivity issues.