Description
Local Execution Mode
Kubeflow SDK UX is critical to the success of the SDK. One of the major features that makes the kubeflow SDK well received by the ML community is the enablement of data scientists and ML engineers to experiment locally first before submitting long running jobs to expensive infrastructure to train their models using kubeflow Trainer.
What you would like to be added?
Following what is proposed here kubeflow/trainer#2231 we would like to support local execution of Training jobs. The local execution mode will give data scientists and ML engineers to test their training jobs and runtimes locally first before utilizing it in production environment. The initial proposal is to support local execution using container based runtimes like (podman, docker, ...etc.) and subprocesses.
Why is this needed?
- Providing a great developer experience for Data Scientists is extremely valuable for growing adoption and catering to our end users.
- Test long training jobs locally first before submitting it to Kubeflow trainer
- Test runtimes locally first before utilizing it in production
Proposal Document
The initial proposal can be found here Local Execution Mode
Love this feature?
Give it a 👍 We prioritize the features with most 👍