-
Notifications
You must be signed in to change notification settings - Fork 548
Description
Contact Details [Optional]
Feature Description
It would be great to get an official version of this interesting AWS Batch step operator into the main zenml library.
Problem or Use Case
I think it's quite common for people to have
- AWS infra and
- heterogenuous compute requirements in their pipeline steps - not everything needs to run on sagemaker
running a local (docker) orchestrator that can push individual components to powerful remote execution engines like AWS Batch sounds super useful to me - its definitely something i would be interested in (I work in ML as an ops engineer).
Happy to contribute based on the linked reference plugin implementation provided by you guys
Proposed Solution
A hardened, more configurable version of the linked plugin implementation that
- allows for step resource configuration that get mapped canonically (where possible) onto AWS Batch resource specs
- default to AWS infra settings that are compatible with current terraform setup utils (where possible)
- integrates with every orchestrator that honours the canonical steplauncher appoach (i.e. not the LocalDockerOrchestrator)
Alternatives Considered
The official Sagemaker step operator. AWS Batch would be a cheaper (no ml.... instance sagemaker type $ markup) and more flexible way of launching scalable custom compute jobs
Additional Context
Implementation draft (unofficial AWS Batch step operator plugin)
Priority
Low - Nice to have
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
Type
Projects
Status