-
Notifications
You must be signed in to change notification settings - Fork 548
AWS Batch step operator #3954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
SebastianScherer88
wants to merge
39
commits into
zenml-io:develop
Choose a base branch
from
SebastianScherer88:feature/aws-step-operator
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
AWS Batch step operator #3954
Changes from 32 commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
ce1de79
Add version 0.84.3 to legacy docs (#3949)
github-actions[bot] e87d336
started creating required files and mapping out the zenml config -> a…
SebastianScherer88 6b076d1
finished first draft of aws batch step operator
SebastianScherer88 c6f2a87
renaming modules and adding unit tests
SebastianScherer88 01017cc
added support for multinode aws batch job type
SebastianScherer88 b22672b
added support for multinode aws batch job type
SebastianScherer88 371a4ac
adding test dependency back in and fixing typo in sagemaker doc string
SebastianScherer88 a80f266
renaming the aws batch runtime context retrieval utility
SebastianScherer88 e372f85
started creating required files and mapping out the zenml config -> a…
SebastianScherer88 3d8c39b
finished first draft of aws batch step operator
SebastianScherer88 0543331
renaming modules and adding unit tests
SebastianScherer88 c9b5829
added support for multinode aws batch job type
SebastianScherer88 c787379
added support for multinode aws batch job type
SebastianScherer88 5fd0761
adding test dependency back in and fixing typo in sagemaker doc string
SebastianScherer88 5466799
renaming the aws batch runtime context retrieval utility
SebastianScherer88 17de12b
bounding aws integration dependency boto3 < 2
SebastianScherer88 1fcefac
using immutable default dict factory instead of mutable empty dict value
SebastianScherer88 eb6c320
removing commented out default args
SebastianScherer88 d1c002b
removing incorrect warning stating that step level resources specific…
SebastianScherer88 98e014e
increased timeout error to 1h and added batch client error handling
SebastianScherer88 1be5965
replicated the sagemaker orchestrator aws authentication and session …
SebastianScherer88 dab9340
resolving merge conflicts
SebastianScherer88 070ef62
fixes off the back initial functional testing
SebastianScherer88 a398139
more changes after successfully e2e testing single node (i.e. aws bat…
SebastianScherer88 1a602eb
fixed step environment settings bug
SebastianScherer88 69e60d1
fixed the multinode targetnode syntax
SebastianScherer88 0d53bce
fixed type hints for instance type
SebastianScherer88 739fdaa
stripping out multinode support as its not really needed given batch …
SebastianScherer88 9665107
fixed fargate networking bug. the container spec model didnt have a n…
SebastianScherer88 4e171c1
default backend is fargate bc its faster and easier to set up the infra
SebastianScherer88 02f9281
fixed integration tests
SebastianScherer88 778bdfd
Merge branch 'develop' into feature/aws-step-operator
SebastianScherer88 8fc2959
addressed all comments except logging
SebastianScherer88 835929c
Merge branch 'feature/aws-step-operator' of https://github.com/Sebast…
SebastianScherer88 6232f4f
buffer of 5 chars
SebastianScherer88 705c2a9
added validation of pipeline and step name before assembling full job…
SebastianScherer88 971cd68
implemented name sanitization as suggested instead of raising excepti…
SebastianScherer88 d2ace24
added ec2 and fargate resource validation to schemas, simplified reso…
SebastianScherer88 d3be040
fixed bug in fargate resource memory validation range
SebastianScherer88 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
201 changes: 201 additions & 0 deletions
201
src/zenml/integrations/aws/flavors/aws_batch_step_operator_flavor.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,201 @@ | ||
# Copyright (c) ZenML GmbH 2022. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at: | ||
# | ||
# https://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express | ||
# or implied. See the License for the specific language governing | ||
# permissions and limitations under the License. | ||
"""Amazon SageMaker step operator flavor.""" | ||
|
||
from typing import TYPE_CHECKING, Dict, Optional, Type, Literal | ||
|
||
from pydantic import Field, PositiveInt, field_validator | ||
from zenml.utils.secret_utils import SecretField | ||
from zenml.config.base_settings import BaseSettings | ||
from zenml.integrations.aws import ( | ||
AWS_RESOURCE_TYPE, | ||
AWS_BATCH_STEP_OPERATOR_FLAVOR, | ||
) | ||
from zenml.models import ServiceConnectorRequirements | ||
from zenml.step_operators.base_step_operator import ( | ||
BaseStepOperatorConfig, | ||
BaseStepOperatorFlavor, | ||
) | ||
|
||
if TYPE_CHECKING: | ||
from zenml.integrations.aws.step_operators import AWSBatchStepOperator | ||
|
||
|
||
class AWSBatchStepOperatorSettings(BaseSettings): | ||
"""Settings for the Sagemaker step operator.""" | ||
|
||
environment: Dict[str, str] = Field( | ||
default_factory=dict, | ||
description="Environment variables to pass to the container during " \ | ||
"execution. Example: {'LOG_LEVEL': 'INFO', 'DEBUG_MODE': 'False'}", | ||
) | ||
job_queue_name: str = Field( | ||
default="", | ||
description="The AWS Batch job queue to submit the step AWS Batch job" | ||
" to. If not provided, falls back to the default job queue name " | ||
"specified at stack registration time. Must be compatible with" | ||
"`backend`." | ||
) | ||
backend: Literal['EC2','FARGATE'] = Field( | ||
default="FARGATE", | ||
description="The AWS Batch platform capability for the step AWS Batch " | ||
"job to be orchestrated with. Must be compatible with `job_queue_name`." | ||
"Defaults to 'FARGATE'." | ||
) | ||
assign_public_ip: Literal['ENABLED','DISABLED'] = Field( | ||
default="ENABLED", | ||
description="Sets the network configuration's assignPublicIp field." | ||
"Only relevant for FARGATE backend." | ||
) | ||
timeout_seconds: PositiveInt = Field( | ||
default=3600, | ||
description="The number of seconds before AWS Batch times out the job." | ||
) | ||
|
||
|
||
|
||
class AWSBatchStepOperatorConfig( | ||
BaseStepOperatorConfig, AWSBatchStepOperatorSettings | ||
): | ||
"""Config for the AWS Batch step operator. | ||
Note: We use ECS as a backend (not EKS), and EC2 as a compute engine (not | ||
Fargate). This is because | ||
- users can avoid the complexity of setting up an EKS cluster, and | ||
- we can AWS Batch multinode type job support later, which requires EC2 | ||
""" | ||
|
||
execution_role: str = Field( | ||
description="The IAM role arn of the ECS execution role." | ||
) | ||
job_role: str = Field( | ||
description="The IAM role arn of the ECS job role." | ||
) | ||
default_job_queue_name: str = Field( | ||
description="The default AWS Batch job queue to submit AWS Batch jobs to." | ||
) | ||
aws_access_key_id: Optional[str] = SecretField( | ||
default=None, | ||
description="The AWS access key ID to use to authenticate to AWS. " | ||
"If not provided, the value from the default AWS config will be used.", | ||
) | ||
aws_secret_access_key: Optional[str] = SecretField( | ||
default=None, | ||
description="The AWS secret access key to use to authenticate to AWS. " | ||
"If not provided, the value from the default AWS config will be used.", | ||
) | ||
aws_profile: Optional[str] = Field( | ||
None, | ||
description="The AWS profile to use for authentication if not using " | ||
"service connectors or explicit credentials. If not provided, the " | ||
"default profile will be used.", | ||
) | ||
aws_auth_role_arn: Optional[str] = Field( | ||
None, | ||
description="The ARN of an intermediate IAM role to assume when " | ||
"authenticating to AWS.", | ||
) | ||
region: Optional[str] = Field( | ||
None, | ||
description="The AWS region where the processing job will be run. " | ||
"If not provided, the value from the default AWS config will be used.", | ||
) | ||
|
||
@property | ||
def is_remote(self) -> bool: | ||
"""Checks if this stack component is running remotely. | ||
This designation is used to determine if the stack component can be | ||
used with a local ZenML database or if it requires a remote ZenML | ||
server. | ||
Returns: | ||
True if this config is for a remote component, False otherwise. | ||
""" | ||
return True | ||
|
||
|
||
class AWSBatchStepOperatorFlavor(BaseStepOperatorFlavor): | ||
"""Flavor for the AWS Batch step operator.""" | ||
|
||
@property | ||
def name(self) -> str: | ||
"""Name of the flavor. | ||
Returns: | ||
The name of the flavor. | ||
""" | ||
return AWS_BATCH_STEP_OPERATOR_FLAVOR | ||
|
||
@property | ||
def service_connector_requirements( | ||
self, | ||
) -> Optional[ServiceConnectorRequirements]: | ||
"""Service connector resource requirements for service connectors. | ||
Specifies resource requirements that are used to filter the available | ||
service connector types that are compatible with this flavor. | ||
Returns: | ||
Requirements for compatible service connectors, if a service | ||
connector is required for this flavor. | ||
""" | ||
return ServiceConnectorRequirements(resource_type=AWS_RESOURCE_TYPE) | ||
|
||
@property | ||
def docs_url(self) -> Optional[str]: | ||
"""A url to point at docs explaining this flavor. | ||
Returns: | ||
A flavor docs url. | ||
""" | ||
return self.generate_default_docs_url() | ||
|
||
@property | ||
def sdk_docs_url(self) -> Optional[str]: | ||
"""A url to point at SDK docs explaining this flavor. | ||
Returns: | ||
A flavor SDK docs url. | ||
""" | ||
return self.generate_default_sdk_docs_url() | ||
|
||
@property | ||
def logo_url(self) -> str: | ||
"""A url to represent the flavor in the dashboard. | ||
Returns: | ||
The flavor logo. | ||
""" | ||
return "https://public-flavor-logos.s3.eu-central-1.amazonaws.com/step_operator/aws_batch.png" | ||
schustmi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
@property | ||
def config_class(self) -> Type[AWSBatchStepOperatorConfig]: | ||
"""Returns BatchStepOperatorConfig config class. | ||
Returns: | ||
The config class. | ||
""" | ||
return AWSBatchStepOperatorConfig | ||
|
||
@property | ||
def implementation_class(self) -> Type["AWSBatchStepOperator"]: | ||
"""Implementation class. | ||
Returns: | ||
The implementation class. | ||
""" | ||
from zenml.integrations.aws.step_operators import AWSBatchStepOperator | ||
|
||
return AWSBatchStepOperator |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.