Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
45abdd6
v0
strickvl Oct 3, 2025
a2455ec
Refactor Modal step operator GPU configuration
strickvl Oct 3, 2025
87c725e
Small refactoring
strickvl Oct 3, 2025
4481a3f
Remove unnecessary asyncio complexity
strickvl Oct 3, 2025
0450853
Refactor token checks
strickvl Oct 3, 2025
705b910
Make environment parameter optional in launch method
strickvl Oct 3, 2025
d19ffeb
Simplify memory conversion using walrus operator
strickvl Oct 3, 2025
9e91e94
Use consistent truthy checks for None handling
strickvl Oct 3, 2025
fdbdea8
Extract timeout default into named constant
strickvl Oct 3, 2025
8cf809d
Fix GPU settings validation in Modal step operator
strickvl Oct 3, 2025
a839fef
Allow step-level modal_environment overrides
strickvl Oct 3, 2025
4dcf11b
Add unit tests for complex helper functions
strickvl Oct 3, 2025
6ffa074
Update docs page
strickvl Oct 3, 2025
00232ea
better error handling
strickvl Oct 3, 2025
ec878bc
Remove excess comments
strickvl Oct 3, 2025
37fd631
Add type hint for resource_settings parameter
strickvl Oct 3, 2025
72d70e2
Enforce timeout constraints in ModalStepOperatorSettings
strickvl Oct 3, 2025
b04ca61
Improve Modal sandbox command execution safety
strickvl Oct 3, 2025
581eda1
Enhance Modal step operator error handling for sandbox creation
strickvl Oct 3, 2025
ccef58b
Remove excessive comment
strickvl Oct 3, 2025
b2154d6
Refactor the get_gpu_values out to utils
strickvl Oct 5, 2025
e4d4af0
Fix darglint docstring errors in Modal integration
strickvl Oct 6, 2025
e37f6da
Merge branch 'develop' into feature/update-modal-step-operator
strickvl Oct 6, 2025
7bcf68c
Small changes
strickvl Oct 6, 2025
f4c753a
mypy fix
strickvl Oct 6, 2025
ae065d1
Tests go in the right folders
strickvl Oct 6, 2025
684cd2c
Add licenses
strickvl Oct 6, 2025
fb4dcb7
Remove dumb tests
strickvl Oct 6, 2025
cff126d
Merge branch 'develop' into feature/update-modal-step-operator
strickvl Oct 6, 2025
86ec76c
Revert Optional setting for environment
strickvl Oct 7, 2025
c999e47
Fix variable naming and error message in Modal step operator
strickvl Oct 7, 2025
31d1a63
Move memory calculation outside Modal runtime context
strickvl Oct 7, 2025
0599ddf
Remove unneeded guardrail
strickvl Oct 7, 2025
c54b2b4
Update comments
strickvl Oct 7, 2025
24bfd63
Remove extra modal pip install
strickvl Oct 7, 2025
1147e99
Adapt CLI tests for Click 8.2 compatibility
strickvl Oct 7, 2025
ad071ed
Adapt CLI tests for Click 8.2 compatibility
strickvl Oct 7, 2025
65129f7
Merge branch 'feature/update-modal-step-operator' of github.com:zenml…
strickvl Oct 7, 2025
05b0dde
Merge remote-tracking branch 'origin/develop' into feature/update-mod…
strickvl Oct 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 27 additions & 13 deletions docs/book/component-guide/step-operators/modal.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,13 @@ To use the Modal step operator, we need:
cloud artifact store supported by ZenML will work with Modal.
* A cloud container registry as part of your stack. Any cloud container
registry supported by ZenML will work with Modal.
* An Image Builder in your stack. ZenML uses it to build the Docker image that
runs on Modal.

The Modal step operator also respects the following environment variables if set:
- MODAL_TOKEN_ID, MODAL_TOKEN_SECRET: authentication tokens
- MODAL_WORKSPACE: workspace name
- MODAL_ENVIRONMENT: Modal environment name (e.g., "main")

We can then register the step operator:

Expand Down Expand Up @@ -66,30 +73,42 @@ You can specify the hardware requirements for each step using the
`ResourceSettings` class as described in our documentation on [resource settings](https://docs.zenml.io/user-guides/tutorial/distributed-training):

```python
from zenml import step
from zenml.config import ResourceSettings
from zenml.integrations.modal.flavors import ModalStepOperatorSettings

modal_settings = ModalStepOperatorSettings(gpu="A100")
modal_settings = ModalStepOperatorSettings(
gpu="A100", # GPU type (e.g., "T4", "A100")
# region="us-east-1", # optional, enterprise/team only
# cloud="aws", # optional, enterprise/team only
# modal_environment="main", # optional
# timeout=86400, # optional, seconds
)

resource_settings = ResourceSettings(
cpu=2,
memory="32GB"
cpu_count=2,
memory="32GB",
# gpu_count=1, # optional; if omitted and a GPU type is set, defaults to 1 GPU
)

@step(
step_operator=True,
step_operator=True, # or the specific name, e.g., step_operator="<NAME>"
settings={
"step_operator": modal_settings,
"resources": resource_settings
}
"resources": resource_settings,
},
)
def my_modal_step():
...
```

Important:
- If you request GPUs with `ResourceSettings.gpu_count > 0`, you must also specify a GPU type via `ModalStepOperatorSettings.gpu`; otherwise the run will fail with a validation error.
- If a GPU type is set but `gpu_count == 0`, ZenML defaults to 1 GPU and logs a warning.
- `cpu_count` must be an integer. `memory` can be a string like "32GB" or an integer amount of bytes.

{% hint style="info" %}
Note that the `cpu` parameter in `ResourceSettings` currently only accepts a single integer value. This specifies a soft minimum limit - Modal will guarantee at least this many physical cores, but the actual usage could be higher. The CPU cores/hour will also determine the minimum price paid for the compute resources.

For example, with the configuration above (2 CPUs and 32GB memory), the minimum cost would be approximately $1.03 per hour ((0.135 * 2) + (0.024 * 32) = $1.03).
{% endhint %}

This will run `my_modal_step` on a Modal instance with 1 A100 GPU, 2 CPUs, and
Expand All @@ -108,8 +127,3 @@ pipeline execution failures. In the case of failures, however, Modal provides
detailed error messages that can help identify what is incompatible. See more in
the [Modal docs on region selection](https://modal.com/docs/guide/region-selection) for more
details.

<!-- For scarf -->
<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>


2 changes: 1 addition & 1 deletion src/zenml/cli/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ def __init__(
commands: Optional[
Union[Dict[str, click.Command], Sequence[click.Command]]
] = None,
**kwargs: Dict[str, Any],
**kwargs: Any,
) -> None:
"""Initialize the Tag group.

Expand Down
2 changes: 1 addition & 1 deletion src/zenml/integrations/modal/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ class ModalIntegration(Integration):
"""Definition of Modal integration for ZenML."""

NAME = MODAL
REQUIREMENTS = ["modal>=0.64.49,<1"]
REQUIREMENTS = ["modal>=1"]

@classmethod
def flavors(cls) -> List[Type[Flavor]]:
Expand Down
77 changes: 72 additions & 5 deletions src/zenml/integrations/modal/flavors/modal_step_operator_flavor.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,18 @@

from typing import TYPE_CHECKING, Optional, Type

from pydantic import Field

from zenml.config.base_settings import BaseSettings
from zenml.integrations.modal import MODAL_STEP_OPERATOR_FLAVOR
from zenml.step_operators import BaseStepOperatorConfig, BaseStepOperatorFlavor
from zenml.utils.secret_utils import SecretField

if TYPE_CHECKING:
from zenml.integrations.modal.step_operators import ModalStepOperator

DEFAULT_TIMEOUT_SECONDS = 86400 # 24 hours


class ModalStepOperatorSettings(BaseSettings):
"""Settings for the Modal step operator.
Expand All @@ -36,20 +41,82 @@ class ModalStepOperatorSettings(BaseSettings):
incompatible. See more in the Modal docs at https://modal.com/docs/guide/region-selection.

Attributes:
gpu: The type of GPU to use for the step execution.
gpu: The type of GPU to use for the step execution (e.g., "T4", "A100").
Use ResourceSettings.gpu_count to specify the number of GPUs.
region: The region to use for the step execution.
cloud: The cloud provider to use for the step execution.
modal_environment: The Modal environment to use for the step execution.
timeout: Maximum execution time in seconds (default 24h).
"""

gpu: Optional[str] = None
region: Optional[str] = None
cloud: Optional[str] = None
gpu: Optional[str] = Field(
None,
description="GPU type for step execution. Must be a valid Modal GPU type. "
"Examples: 'T4' (cost-effective), 'A100' (high-performance), 'V100' (training workloads). "
"Use ResourceSettings.gpu_count to specify number of GPUs. If not specified, uses CPU-only execution",
)
region: Optional[str] = Field(
None,
description="Cloud region for step execution. Must be a valid region for the selected cloud provider. "
"Examples: 'us-east-1', 'us-west-2', 'eu-west-1'. If not specified, Modal uses default region "
"based on cloud provider and availability",
)
cloud: Optional[str] = Field(
None,
description="Cloud provider for step execution. Must be a valid Modal-supported cloud provider. "
"Examples: 'aws', 'gcp'. If not specified, Modal uses default cloud provider "
"based on workspace configuration",
)
modal_environment: Optional[str] = Field(
None,
description="Modal environment name for step execution. Must be a valid environment "
"configured in your Modal workspace. Examples: 'main', 'staging', 'production'. "
"If not specified, uses the default environment for the workspace",
)
timeout: int = Field(
DEFAULT_TIMEOUT_SECONDS,
ge=1,
le=DEFAULT_TIMEOUT_SECONDS,
description=f"Maximum execution time in seconds for step completion. Must be between 1 and {DEFAULT_TIMEOUT_SECONDS} seconds. "
f"Examples: 3600 (1 hour), 7200 (2 hours), {DEFAULT_TIMEOUT_SECONDS} (24 hours maximum). "
"Step execution will be terminated if it exceeds this timeout",
)


class ModalStepOperatorConfig(
BaseStepOperatorConfig, ModalStepOperatorSettings
):
"""Configuration for the Modal step operator."""
"""Configuration for the Modal step operator.

Attributes:
token_id: Modal API token ID (ak-xxxxx format) for authentication.
token_secret: Modal API token secret (as-xxxxx format) for authentication.
workspace: Modal workspace name (optional).

Note: If token_id and token_secret are not provided, falls back to
Modal's default authentication (~/.modal.toml).
All other configuration options (modal_environment, gpu, region, etc.)
are inherited from ModalStepOperatorSettings.
"""

token_id: Optional[str] = SecretField(
default=None,
description="Modal API token ID for authentication. Must be in format 'ak-xxxxx' as provided by Modal. "
"Example: 'ak-1234567890abcdef'. If not provided, falls back to Modal's default authentication "
"from ~/.modal.toml file. Required for programmatic access to Modal API",
)
token_secret: Optional[str] = SecretField(
default=None,
description="Modal API token secret for authentication. Must be in format 'as-xxxxx' as provided by Modal. "
"Example: 'as-abcdef1234567890'. Used together with token_id for API authentication. "
"If not provided, falls back to Modal's default authentication from ~/.modal.toml file",
)
workspace: Optional[str] = Field(
None,
description="Modal workspace name for step execution. Must be a valid workspace name "
"you have access to. Examples: 'my-company', 'ml-team', 'personal-workspace'. "
"If not specified, uses the default workspace from Modal configuration",
)

@property
def is_remote(self) -> bool:
Expand Down
Loading
Loading