Skip to content

Add out-of-cluster support for KubernetesJob #2657

@thedavekwon

Description

@thedavekwon

We currently only support in-cluster KubernetesJob, but there is great usefulness in having a client external to the cluster driving the job. This allows workflows such as:

  • A jupyter notebook that can send messages to meshes on the cluster. Reloading single cells enables fast iteration speed while providing access to remote hardware at a larger scale
  • User scripts running on local clients outside the cluster have greater flexibility in what code they can write
  • Starting jobs from a separate cluster, for example from a web server, and being able to send commands into the other cluster

There is one major blocker to get this working: monarch clients rely on a way to directly communicate with all hosts in the mesh, either by IP address + port or a hostname + port. Kubernetes does not expose public static IP addresses or hostnames by default outside of the cluster. This task is about finding a way to bridge the communication gap.

A second subtask is how to efficiently get the code from the local client onto the remote cluster. We would also like if we can get the user's installed PyPI packages synced as well, so users can quickly try things that require new packages. For an initial implementation the existing HostMesh.sync_workspace will be fine, but we'll need to enhance that for scale, reliability, and connectivity issues.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions