AWS Batch step operator #3954

SebastianScherer88 · 2025-09-14T18:33:05Z

Overview

As per this original issue in the official zenml github repository

Includes the following components for the proposed AWS Batch step operator:

step operator class and config
step operator flavor and settings
basic unit test coverage

The proposed implementation hardcodes ECS as an AWS orchestration backend - as opposed to EKS - as its a less complex cloud infrastructure stack to setup than EKS.

The AWS Bath Step operator supports both EC2 and FARGATE AWS Batch platform capabilities. Both can be used by the same zenml stack - the registration of the AWS Batch step operator component merely requires a default job queue (which has to be either FARGATE or EC2), but an arbitrary job queue can be passed to the step as part of its step settings. This means that an AWS stack with multiple job queues (i.e. one FARGATE type one for smaller workloads, and an EC2 type one to support GPU hardware) can be used without having to register separate zenml stacks.

Limitiations

Note that this feature in its current implementation does not support AWS Batch's multinode job type, as it was deemed surplus to requirements. AWS Batch offers a host of large, multi-GPU nodes for distributed and accelerated workloads than can be leverage in a single container type job supported by this feature.

Tests

To run the new tests, run

pytest tests/integration/integrations/aws/step_operators

Summary by CodeRabbit

New Features
- Introduced AWS Batch step operator with a new flavor in the AWS integration.
- Supports single- and multi-node execution, configurable instance types, environment variables, node count, and timeouts.
- Provides runtime context access to AWS Batch job details.
- Compatible with service connectors for AWS resources.
Tests
- Added integration tests for environment/resource mapping and settings initialization.
Chores
- Updated AWS integration dependencies to include boto3 (>=1.40.30).
- Exposed the new AWS Batch flavor and step operator through the public API.

Co-authored-by: ZenML GmbH <[email protected]> (cherry picked from commit af53077)

…ws batch job definition conversion

CLAassistant · 2025-09-14T18:33:12Z

All committers have signed the CLA.

coderabbitai · 2025-09-14T18:33:13Z

📝 Walkthrough

Walkthrough

Introduces an official AWS Batch step operator, its flavor, config/settings, and entrypoint configuration; updates AWS integration exports and requirements; and adds integration tests validating environment/resource mapping and settings initialization.

Changes

Cohort / File(s)	Summary of Changes
AWS integration exports & deps `src/zenml/integrations/aws/__init__.py`	Adds AWS_BATCH_STEP_OPERATOR_FLAVOR constant; includes boto3>=1.40.30 in REQUIREMENTS; exports AWSBatchStepOperatorFlavor; includes it in AWSIntegration.flavors().
Flavors package `src/zenml/integrations/aws/flavors/__init__.py`, `src/zenml/integrations/aws/flavors/aws_batch_step_operator_flavor.py`	Adds AWSBatchStepOperatorFlavor plus settings/config; re-exports flavor and config; defines service connector requirements, metadata, and lazy implementation import.
Step operators exports `src/zenml/integrations/aws/step_operators/__init__.py`	Re-exports AWSBatchStepOperator and get_context; updates all.
AWS Batch step operator implementation `src/zenml/integrations/aws/step_operators/aws_batch_step_operator.py`, `src/zenml/integrations/aws/step_operators/aws_batch_step_operator_entrypoint_config.py`	Implements AWSBatchStepOperator, runtime context, job definition models, environment/resource mapping, job definition generation, boto3-based launch flow with polling; adds AWSBatchEntrypointConfiguration class.
Tests `tests/integration/integrations/aws/step_operators/__init__.py`, `tests/integration/integrations/aws/step_operators/test_aws_batch_step_operator.py`, `tests/integration/integrations/aws/step_operators/test_aws_batch_step_operator_flavor.py`	Adds license header; tests context derivation, env/resource mapping, and settings initialization.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor Dev as User Code
    participant Z as ZenML Runtime
    participant Op as AWSBatchStepOperator
    participant B as boto3 (AWS Batch)
    participant AWS as AWS Batch Service

    Dev->>Z: Run pipeline step
    Z->>Op: launch(info, entrypoint, env)
    Note over Op: Resolve Docker image, entrypoint, settings
    Op->>B: register_job_definition(jobDef)
    B->>AWS: Create/Update JobDefinition
    AWS-->>B: ARN/Rev
    Op->>B: submit_job(name, jobQueue, jobDefinition, overrides)
    B->>AWS: Submit job
    AWS-->>B: jobId
    loop Poll until terminal state
        Op->>B: describe_jobs(jobId)
        B->>AWS: Fetch job status
        AWS-->>B: RUNNING | SUCCEEDED | FAILED
    end
    alt SUCCEEDED
        Op-->>Z: Report success
    else FAILED
        Op-->>Z: Raise failure
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Official AWS Batch step operator #3919 — Implements the requested official AWS Batch step operator, including flavor, config, and execution.

Suggested labels

enhancement

Suggested reviewers

bcdurak
strickvl
avishniakov

Poem

I thump my paws with batchy cheer,
New queues to hop, the clouds draw near.
Env mapped neat, resources tight,
I launch a job and watch the night.
Poll, poll, poll—success in sight!
Carrot logs: SUCCEEDED bright. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description does not follow the repository’s required template, as it lacks the “## Describe changes”, “## Pre-requisites”, and “## Types of changes” sections and their associated content such as the implementation summary, contribution checklist, and change classification. This omission means the description is incomplete relative to the prescribed structure.	Please update the description to include the “## Describe changes” section summarizing what was implemented and why, the “## Pre-requisites” checklist confirming compliance with contribution guidelines and test coverage, and the “## Types of changes” section with the appropriate checkbox selected.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 92.00% which is sufficient. The required threshold is 80.00%.
Title Check	✅ Passed	The title succinctly describes the primary change—the addition of the AWS Batch step operator flavor and implementation. It directly reflects the main feature introduced by the pull request without extraneous details or vague terms. It is concise and clear for teammates reviewing the project history.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ws batch job definition conversion

htahir1 · 2025-09-16T14:12:19Z

@SebastianScherer88 thanks for this - we glanced at this internally and it looks quite decent as a first draft. youll get reviews in due course... and we'll test it on our AWS too

Did you manage to set up your AWS to test it?

htahir1 · 2025-09-16T14:12:27Z

@coderabbitai review

htahir1 · 2025-09-16T14:12:32Z

@codex review

coderabbitai · 2025-09-16T14:12:34Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

chatgpt-codex-connector · 2025-09-16T14:12:37Z

To use Codex here, create a Codex account and connect to github.

coderabbitai

Actionable comments posted: 9

🧹 Nitpick comments (10)

src/zenml/integrations/aws/flavors/aws_batch_step_operator_flavor.py (3)

14-14: Fix stale module docstring (SageMaker → AWS Batch).

Apply:

-"""Amazon SageMaker step operator flavor."""
+"""AWS Batch step operator flavor."""

35-37: Fix class docstring (SageMaker → AWS Batch).

-class AWSBatchStepOperatorSettings(BaseSettings):
-    """Settings for the Sagemaker step operator."""
+class AWSBatchStepOperatorSettings(BaseSettings):
+    """Settings for the AWS Batch step operator."""

38-42: Tighten field descriptions and fix grammar.

Current strings have broken spacing and minor grammar issues.

-    instance_type: Union[str,List[str]] = Field(
-        default='optimal',
-        description="The instance type for AWS Batch to use for the step" \
-        " execution. Example: 'm5.xlarge'",
-    )
+    instance_type: Union[str, List[str]] = Field(
+        default="optimal",
+        description=(
+            "Instance type(s) for the job. Example: 'm5.xlarge' or "
+            "['c6i.4xlarge', 'm6i.4xlarge']."
+        ),
+    )
@@
-    node_count: PositiveInt = Field(
-        default=1,
-        description="The number of AWS Batch nodes to run the step on. If > 1," \
-        "an AWS Batch multinode job will be run, with the network connectivity" \
-        "between the nodes provided by AWS Batch. See https://docs.aws.amazon.com/batch/latest/userguide/multi-node-parallel-jobs.html" \
-        "for details."
-    )
+    node_count: PositiveInt = Field(
+        default=1,
+        description=(
+            "Number of nodes. If > 1, run as a multi‑node parallel job with "
+            "AWS‑managed networking. See: "
+            "https://docs.aws.amazon.com/batch/latest/userguide/multi-node-parallel-jobs.html"
+        ),
+    )
@@
-    execution_role: str = Field(
-        "",
-        description="The ECS execution role required to execute the AWS Batch" \
-        " jobs as an ECS tasks."
-    )
+    execution_role: str = Field(
+        "",
+        description="ECS execution role assumed by tasks that run AWS Batch jobs.",
+    )
-    job_role: str = Field(
-        "",
-        description="The ECS job role required by the container runtime inside" \
-        "the ECS task."
-    )
+    job_role: str = Field(
+        "",
+        description="Task role (job role) assumed by the container at runtime.",
+    )

Also applies to: 48-54, 73-82

tests/integration/integrations/aws/step_operators/test_aws_batch_step_operator_flavor.py (1)

3-8: Add assertions; current test only instantiates the settings.

Validate fields to make the test meaningful.

-def test_aws_batch_step_operator_settings():
-    AWSBatchStepOperatorSettings(
-        instance_type="g4dn.xlarge",
-        environment={"key_1":"value_1","key_2":"value_2"},
-        timeout_seconds=60
-    )
+def test_aws_batch_step_operator_settings():
+    """Ensure settings initialize and retain provided values."""
+    s = AWSBatchStepOperatorSettings(
+        instance_type="g4dn.xlarge",
+        environment={"key_1": "value_1", "key_2": "value_2"},
+        timeout_seconds=60,
+    )
+    assert s.instance_type == "g4dn.xlarge"
+    assert s.environment == {"key_1": "value_1", "key_2": "value_2"}
+    assert s.timeout_seconds == 60
+    # verify defaults
+    assert s.node_count == 1

src/zenml/integrations/aws/step_operators/aws_batch_step_operator_entrypoint_config.py (1)

14-14: Clarify module and class docstrings (explicit “AWS Batch”).

-"""Entrypoint configuration for ZenML Batch step operator."""
+"""Entrypoint configuration for the ZenML AWS Batch step operator."""
@@
-class AWSBatchEntrypointConfiguration(StepOperatorEntrypointConfiguration):
-    """Entrypoint configuration for ZenML Batch step operator."""
+class AWSBatchEntrypointConfiguration(StepOperatorEntrypointConfiguration):
+    """Entrypoint configuration for the ZenML AWS Batch step operator."""

Also applies to: 21-22

src/zenml/integrations/aws/step_operators/__init__.py (1)

14-24: Clean up docstring, remove unused noqa, and sort __all__.

Aligns with Ruff hints (RUF100, RUF022) and clarifies package purpose.

-"""Initialization of the Sagemaker Step Operator."""
+"""AWS step operators package (SageMaker and AWS Batch)."""
@@
-from zenml.integrations.aws.step_operators.sagemaker_step_operator import (  # noqa: F401
+from zenml.integrations.aws.step_operators.sagemaker_step_operator import (
     SagemakerStepOperator,
 )
-from zenml.integrations.aws.step_operators.aws_batch_step_operator import (  # noqa: F401
+from zenml.integrations.aws.step_operators.aws_batch_step_operator import (
     AWSBatchStepOperator,
     get_context
 )
-__all__ = ["SagemakerStepOperator","AWSBatchStepOperator","get_context"]
+__all__ = ["AWSBatchStepOperator", "SagemakerStepOperator", "get_context"]

tests/integration/integrations/aws/step_operators/test_aws_batch_step_operator.py (1)

24-27: Pass strings to monkeypatch.setenv; env vars must be strings.

Prevents potential type issues on some platforms/Python versions.

-    monkeypatch.setenv('AWS_BATCH_JOB_MAIN_NODE_INDEX',0)
-    monkeypatch.setenv('AWS_BATCH_JOB_MAIN_NODE_PRIVATE_IPV4_ADDRESS','test-address')
-    monkeypatch.setenv('AWS_BATCH_JOB_NODE_INDEX',1)
-    monkeypatch.setenv('AWS_BATCH_JOB_NUM_NODES',2)
+    monkeypatch.setenv("AWS_BATCH_JOB_MAIN_NODE_INDEX", "0")
+    monkeypatch.setenv("AWS_BATCH_JOB_MAIN_NODE_PRIVATE_IPV4_ADDRESS", "test-address")
+    monkeypatch.setenv("AWS_BATCH_JOB_NODE_INDEX", "1")
+    monkeypatch.setenv("AWS_BATCH_JOB_NUM_NODES", "2")

src/zenml/integrations/aws/step_operators/aws_batch_step_operator.py (3)

14-14: Fix the module docstring to reference AWS Batch instead of SageMaker.

The docstring incorrectly references "Sagemaker Step Operator" instead of AWS Batch Step Operator.

Apply this diff to fix the docstring:
-"""Implementation of the Sagemaker Step Operator."""
+"""Implementation of the AWS Batch Step Operator."""
283-283: Fix improper string formatting in log message.

The log message uses an f-string but doesn't format the log properly through the logger.

Apply this diff to fix the logging:
-                    logger.info(f"AWS Batch only accepts int type cpu resource requirements. Converted {resource_settings.cpu_count} to {cpu_count_int}")
+                    logger.info("AWS Batch only accepts int type cpu resource requirements. Converted %s to %s", resource_settings.cpu_count, cpu_count_int)
361-361: Simplify node range generation using string formatting.

The targetNodes string generation can be simplified.

Apply this diff to simplify:
-                            targetNodes=','.join([str(node_index) for node_index in range(node_count)]),
+                            targetNodes=f"0:{node_count-1}",

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 89ac517 and 5466799.

📒 Files selected for processing (9)

src/zenml/integrations/aws/__init__.py (3 hunks)
src/zenml/integrations/aws/flavors/__init__.py (2 hunks)
src/zenml/integrations/aws/flavors/aws_batch_step_operator_flavor.py (1 hunks)
src/zenml/integrations/aws/step_operators/__init__.py (1 hunks)
src/zenml/integrations/aws/step_operators/aws_batch_step_operator.py (1 hunks)
src/zenml/integrations/aws/step_operators/aws_batch_step_operator_entrypoint_config.py (1 hunks)
tests/integration/integrations/aws/step_operators/__init__.py (1 hunks)
tests/integration/integrations/aws/step_operators/test_aws_batch_step_operator.py (1 hunks)
tests/integration/integrations/aws/step_operators/test_aws_batch_step_operator_flavor.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

src/zenml/**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for conformity with Python best practices.

Files:

src/zenml/integrations/aws/step_operators/aws_batch_step_operator_entrypoint_config.py
src/zenml/integrations/aws/flavors/__init__.py
src/zenml/integrations/aws/step_operators/__init__.py
src/zenml/integrations/aws/__init__.py
src/zenml/integrations/aws/flavors/aws_batch_step_operator_flavor.py
src/zenml/integrations/aws/step_operators/aws_batch_step_operator.py

tests/**/*.py

⚙️ CodeRabbit configuration file

tests/**/*.py: "Assess the unit test code employing the PyTest testing framework. Confirm that:

The tests adhere to PyTest's established best practices.

Test descriptions are sufficiently detailed to clarify the purpose of each test."

Files:

tests/integration/integrations/aws/step_operators/test_aws_batch_step_operator_flavor.py
tests/integration/integrations/aws/step_operators/__init__.py
tests/integration/integrations/aws/step_operators/test_aws_batch_step_operator.py

🪛 Ruff (0.12.2)

src/zenml/integrations/aws/step_operators/__init__.py

16-16: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

19-19: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

23-23: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

🔇 Additional comments (2)

tests/integration/integrations/aws/step_operators/__init__.py (1)

1-14: LGTM: license header only.

No actionable issues.

src/zenml/integrations/aws/flavors/__init__.py (1)

32-35: LGTM: public re-exports for AWS Batch flavor.

Consistent with integration wiring.

Also applies to: 46-49

src/zenml/integrations/aws/__init__.py

src/zenml/integrations/aws/flavors/aws_batch_step_operator_flavor.py

src/zenml/integrations/aws/step_operators/aws_batch_step_operator.py

…ation is not configured

SebastianScherer88 · 2025-09-16T22:38:25Z

@SebastianScherer88 thanks for this - we glanced at this internally and it looks quite decent as a first draft. youll get reviews in due course... and we'll test it on our AWS too

Did you manage to set up your AWS to test it?

no, i haven't done any functional testing with an actual aws backend yet. might have some time this coming weekend 🤔

…ch container type) step after hacking + hardcoding some remote rds connection credentials into custom parent image. also needed parent image bc the new flavor module is obvs not installed in official distribution which is in the default docker image used by zenml

SebastianScherer88 · 2025-09-22T00:15:49Z

i managed to set up a test stack in aws and run some e2e tests. the current version works for single node aws batch jobs (i.e. type container) but the multinode approach doesnt work (yet). not sure if i can get that to work here or if that's a more fundamental design issue here - atm i'm generating identical job descriptions for all batch nodes, which probably means that these containers are all trying to identify as the same zenml step with the store 🤔

…ec2 instances sizes, and instead support both fargate and ec2 (for gpu) backends on ecs

SebastianScherer88 · 2025-09-23T00:37:07Z

i've decided to drop support for multinode type aws batch jobs as

i dont think i can get it to work and
its not really needed given that container type, ec2 platform capability jobs can choose extremely large, multigpu instances, so distributed training is covered in the reduced feature offering, too

bcdurak · 2025-09-25T15:29:45Z

Hey @SebastianScherer88, thanks a lot for your contribution. I will take over the reviewer role and test this as soon as possible. In the meanwhile, if you change it from a draft PR to "Ready for review", the CI can help us solve the formatting and linting issues as a start.

…etworking config attibute and was silently ignoring the kwargs passed in the FARGATE path

SebastianScherer88 · 2025-09-25T21:35:37Z

Hey @SebastianScherer88, thanks a lot for your contribution. I will take over the reviewer role and test this as soon as possible. In the meanwhile, if you change it from a draft PR to "Ready for review", the CI can help us solve the formatting and linting issues as a start.

ready for 👀 😄 @bcdurak

src/zenml/integrations/aws/step_operators/aws_batch_step_operator_entrypoint_config.py

src/zenml/integrations/aws/flavors/aws_batch_step_operator_flavor.py

src/zenml/integrations/aws/step_operators/aws_batch_step_operator.py

src/zenml/integrations/aws/step_operators/__init__.py

src/zenml/integrations/aws/flavors/aws_batch_step_operator_flavor.py

src/zenml/integrations/aws/step_operators/aws_batch_step_operator.py

…ianScherer88/zenml into feature/aws-step-operator

… description name@

…ons for invalid characters

…urce mapping method. updated unit test coverage

schustmi · 2025-10-06T07:17:01Z

src/zenml/integrations/aws/step_operators/aws_batch_step_operator.py

+        return mapped_resource_settings
+
+    @staticmethod
+    def sanitize_name(name: str) -> bool:


While this works, I think we can improve this a little. See for example here

zenml/src/zenml/integrations/kubernetes/orchestrators/kube_utils.py

Lines 144 to 162 in e025293

def sanitize_label(label: str) -> str:

"""Sanitize a label for a Kubernetes resource.

Args:

label: The label to sanitize.

Returns:

The sanitized label.

"""

# https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#rfc-1035-label-names

label = re.sub(r"[^a-z0-9-]", "-", label.lower())

label = re.sub(r"^[-]+", "", label)

label = re.sub(r"[-]+", "-", label)

label = label[:63]

# Remove trailing dashes after truncation to make sure we end with an

# alphanumeric character

label = re.sub(r"[-]+$", "", label)

return label

for a similar method, which also removes leading/trailing/consecutive -.

I'm thinking something like this

job_name = f"{pipeline_name}-{step_name}" job_name = sanitize_name(job_name, max_length=115) job_name += random_str(6)

Also, docstring still missing for this method which will probably cause an issue in the CI

What we have now works fine imo, so i'll leave any additional formatting to you

A random string "would also work", and both the random string as well as your solution here won't get my approval as it's not just about "it somehow works".

Agree to disagree. If you wanted a very specific and non negotiable implementation of this, you should have formulated those requirements up front, rather than insisting on arbitrary personal cosmetic preferences after the fact. This is bad review practice imo, since it unnecessarily imposes on the time of unpaid volunteer contributors.

I have already spent a significant deal of my personal free time to provide a non trivial piece of functionality to the zenml framework for free, including provisioning my own cloud infrastructure for e2e testing, and will now leave this PR as is. You are of course free to use the provided work in its current state as you wish - implement the changes you deem a must-have yourself now or later, or reject the work in its current form. 👍

schustmi · 2025-10-06T08:04:11Z

@SebastianScherer88 There are also some formatting issues, running bash scripts/format.sh should solve them I think.

SebastianScherer88 · 2025-10-06T08:39:54Z

@SebastianScherer88 There are also some formatting issues, running bash scripts/format.sh should solve them I think.

Got a link for the failing CI? If formatting is a prerequisite for passing the CI, you guys should look at pre commits

schustmi · 2025-10-06T08:52:02Z

@SebastianScherer88 There are also some formatting issues, running bash scripts/format.sh should solve them I think.

Got a link for the failing CI? If formatting is a prerequisite for passing the CI, you guys should look at pre commits

So are linting, docstrings, unit tests, integration tests and many other things. And we do not run these either as part of a pre-commit hook, as they require extra dependencies and time.

It's all written down here: https://github.com/zenml-io/zenml/blob/main/CONTRIBUTING.md

Failing CI:

github-actions bot and others added 6 commits September 12, 2025 10:09

Add version 0.84.3 to legacy docs (zenml-io#3949)

ce1de79

Co-authored-by: ZenML GmbH <[email protected]> (cherry picked from commit af53077)

started creating required files and mapping out the zenml config -> a…

e87d336

…ws batch job definition conversion

finished first draft of aws batch step operator

6b076d1

renaming modules and adding unit tests

c6f2a87

added support for multinode aws batch job type

01017cc

added support for multinode aws batch job type

b22672b

SebastianScherer88 added 9 commits September 14, 2025 19:35

adding test dependency back in and fixing typo in sagemaker doc string

371a4ac

renaming the aws batch runtime context retrieval utility

a80f266

started creating required files and mapping out the zenml config -> a…

e372f85

…ws batch job definition conversion

finished first draft of aws batch step operator

3d8c39b

renaming modules and adding unit tests

0543331

added support for multinode aws batch job type

c9b5829

added support for multinode aws batch job type

c787379

adding test dependency back in and fixing typo in sagemaker doc string

5fd0761

renaming the aws batch runtime context retrieval utility

5466799

SebastianScherer88 force-pushed the feature/aws-step-operator branch from a80f266 to 5466799 Compare September 14, 2025 20:27

htahir1 linked an issue Sep 15, 2025 that may be closed by this pull request

Official AWS Batch step operator #3919

Open

1 task

coderabbitai bot reviewed Sep 16, 2025

View reviewed changes

SebastianScherer88 added 5 commits September 16, 2025 22:20

bounding aws integration dependency boto3 < 2

17de12b

using immutable default dict factory instead of mutable empty dict value

1fcefac

removing commented out default args

eb6c320

removing incorrect warning stating that step level resources specific…

d1c002b

…ation is not configured

increased timeout error to 1h and added batch client error handling

98e014e

SebastianScherer88 added 5 commits September 21, 2025 01:57

fixes off the back initial functional testing

070ef62

fixed step environment settings bug

1a602eb

fixed the multinode targetnode syntax

69e60d1

fixed type hints for instance type

0d53bce

stripping out multinode support as its not really needed given batch …

739fdaa

…ec2 instances sizes, and instead support both fargate and ec2 (for gpu) backends on ecs

bcdurak self-requested a review September 25, 2025 15:29

SebastianScherer88 added 3 commits September 25, 2025 22:24

fixed fargate networking bug. the container spec model didnt have a n…

9665107

…etworking config attibute and was silently ignoring the kwargs passed in the FARGATE path

default backend is fargate bc its faster and easier to set up the infra

4e171c1

fixed integration tests

02f9281

SebastianScherer88 marked this pull request as ready for review September 25, 2025 21:34

Merge branch 'develop' into feature/aws-step-operator

778bdfd

schustmi requested changes Oct 1, 2025

View reviewed changes

SebastianScherer88 added 3 commits October 1, 2025 22:20

addressed all comments except logging

8fc2959

Merge branch 'feature/aws-step-operator' of https://github.com/Sebast…

835929c

…ianScherer88/zenml into feature/aws-step-operator

buffer of 5 chars

6232f4f

schustmi changed the title ~~Feature: AWS step operator~~ AWS Batch step operator Oct 2, 2025

SebastianScherer88 added 3 commits October 4, 2025 15:59

added validation of pipeline and step name before assembling full job…

705c2a9

… description name@

implemented name sanitization as suggested instead of raising excepti…

971cd68

…ons for invalid characters

added ec2 and fargate resource validation to schemas, simplified reso…

d2ace24

…urce mapping method. updated unit test coverage

schustmi requested changes Oct 6, 2025

View reviewed changes

fixed bug in fargate resource memory validation range

d3be040

	def sanitize_label(label: str) -> str:
	"""Sanitize a label for a Kubernetes resource.

	Args:
	label: The label to sanitize.

	Returns:
	The sanitized label.
	"""
	# https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#rfc-1035-label-names
	label = re.sub(r"[^a-z0-9-]", "-", label.lower())
	label = re.sub(r"^[-]+", "", label)
	label = re.sub(r"[-]+", "-", label)
	label = label[:63]
	# Remove trailing dashes after truncation to make sure we end with an
	# alphanumeric character
	label = re.sub(r"[-]+$", "", label)

	return label

AWS Batch step operator #3954

Are you sure you want to change the base?

AWS Batch step operator #3954

Uh oh!

Conversation

SebastianScherer88 commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Limitiations

Tests

Summary by CodeRabbit

Uh oh!

CLAassistant commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

htahir1 commented Sep 16, 2025

Uh oh!

htahir1 commented Sep 16, 2025

Uh oh!

htahir1 commented Sep 16, 2025

Uh oh!

coderabbitai bot commented Sep 16, 2025

Uh oh!

chatgpt-codex-connector bot commented Sep 16, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SebastianScherer88 commented Sep 16, 2025

Uh oh!

SebastianScherer88 commented Sep 22, 2025

Uh oh!

SebastianScherer88 commented Sep 23, 2025

Uh oh!

bcdurak commented Sep 25, 2025

Uh oh!

SebastianScherer88 commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

schustmi Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

SebastianScherer88 Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

schustmi Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

SebastianScherer88 Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

SebastianScherer88 commented Sep 14, 2025 •

edited

Loading

CLAassistant commented Sep 14, 2025 •

edited

Loading

coderabbitai bot commented Sep 14, 2025 •

edited

Loading

SebastianScherer88 commented Sep 25, 2025 •

edited

Loading

SebastianScherer88 Oct 6, 2025 •

edited

Loading