Update Modal step operator #4038

strickvl · 2025-10-03T16:45:36Z

#3733 is taking a while to get merged, so this takes the Step Operator improvements and builds on them.

Notably, our Modal integration depedency is now set to modal>=1. (Our Modal Deployer will also have this as its base, so I split these changes out into its own PR in the hope it can be merged sooner).

Otherwise this PR does some cleanup and adds some unit tests (the bulk of the code additions).

(It also addresses some fixes to the CI that are unrelated to this PR).

Inline GPU configuration logic directly into the step operator instead of using a utility function. The logic is simple enough (10 lines of string formatting) that a shared utility adds unnecessary indirection. Changes: - Inline GPU value construction in modal_step_operator.py - Remove get_gpu_values utility function from utils.py - Remove test file that only tested trivial string formatting This addresses the TODO comment to refactor and remove the get_gpu_values helper function.

Aligns type hint with build_modal_image signature for consistency.

Reduces verbosity by combining assignment and conditional check.

Aligns with existing pattern used throughout Modal integration.

Improves maintainability by defining the value in a single location.

Previously, setting gpu_count > 0 without specifying a GPU type would silently run on CPU with no warning. This adds comprehensive validation that raises clear configuration errors with actionable guidance. Changes: - Add _compute_modal_gpu_arg() helper to validate and format GPU settings - Raise StackComponentInterfaceError when gpu_count > 0 but gpu is None - Warn and default to 1 GPU when gpu is set but gpu_count is 0 - Validate gpu_count is non-negative integer - Normalize whitespace in GPU type strings - Add missing List import to utils.py

Previously, setup_modal_client always used the component-level modal_environment config, preventing per-step overrides via settings. Now prioritizes settings.modal_environment when provided, falling back to component config for backward compatibility.

Improves type safety by explicitly typing the resource_settings parameter as ResourceSettings in _compute_modal_gpu_arg method.

Adds Pydantic field constraints to enforce the timeout range (1-86400 seconds) that was previously only documented. Uses ge=1 and le=DEFAULT_TIMEOUT_SECONDS to validate input at instantiation time.

Changes command execution from shell-based to direct varargs invocation, eliminating shell metacharacter issues and removing the implicit bash dependency. Adds proper argument quoting via shlex.join() as a fallback for backward compatibility with older Modal client versions that don't support varargs command passing.

Improves error handling during the creation of a Modal sandbox by adding a fallback mechanism for command execution. If the direct varargs invocation fails due to a TypeError, the code now falls back to a shell-quoted invocation using shlex.join() to ensure compatibility with older Modal client versions. This change enhances robustness and maintains backward compatibility while preserving argument safety.

src/zenml/integrations/modal/utils.py

github-actions · 2025-10-03T16:48:12Z

Documentation Link Check Results

❌ Absolute links check failed
There are broken absolute links in the documentation. See workflow logs for details
✅ Relative links check passed
_{Last checked: 2025-10-10 07:40:29 UTC}

Copilot

Pull Request Overview

This PR updates the Modal step operator integration to use Modal>=1 and introduces significant improvements including proper utility functions, enhanced GPU argument handling, and comprehensive unit tests.

Key changes:

Updates Modal dependency from modal>=0.64.49,<1 to modal>=1
Refactors shared functionality into a new utils.py module with authentication setup, image building, and stack validation
Replaces the simplistic get_gpu_values function with robust _compute_modal_gpu_arg method that validates GPU configurations and prevents silent CPU fallbacks

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`src/zenml/integrations/modal/__init__.py`	Updates Modal dependency requirement to version >=1
`src/zenml/integrations/modal/utils.py`	New utility module with authentication setup, image building, and stack validation functions
`src/zenml/integrations/modal/step_operators/modal_step_operator.py`	Major refactor using new utilities, improved GPU handling, and better error reporting
`src/zenml/integrations/modal/flavors/modal_step_operator_flavor.py`	Enhanced configuration with detailed field descriptions, validation, and new authentication fields
`docs/book/component-guide/step-operators/modal.md`	Updated documentation with clearer examples and GPU configuration guidance
`tests/integration/integrations/modal/step_operators/`	New comprehensive test suite covering utilities, GPU configurations, and flavor validation

Comments suppressed due to low confidence (2)

docs/book/component-guide/step-operators/modal.md:1

The scarf tracking image has been removed from the documentation. Verify this removal is intentional and aligns with documentation standards.

---

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/zenml/integrations/modal/step_operators/modal_step_operator.py

Add missing parameter documentation to helper functions and exception documentation to the launch method to satisfy darglint requirements.

src/zenml/integrations/modal/step_operators/modal_step_operator.py

src/zenml/integrations/modal/utils.py

Rename zenml_image to modal_image to accurately reflect that the function returns a Modal-specific image object, not a ZenML construct. Update error message to describe actual failure scenarios: Modal's inability to access the container registry or extend the Docker image, rather than suggesting the image might not exist (which is incorrect since ZenML just built and pushed it successfully).

Move the memory_int calculation from inside the async with app.run() context to outside, grouping it with other resource preparations like gpu_values. This is a pure data transformation that doesn't require Modal's runtime context, improving code organization and reducing unnecessary nesting.

tests/integration/functional/cli/utils.py

…-io/zenml into feature/update-modal-step-operator

…al-step-operator

strickvl added 20 commits October 3, 2025 14:30

v0

45abdd6

Small refactoring

87c725e

Remove unnecessary asyncio complexity

4481a3f

Refactor token checks

0450853

Make environment parameter optional in launch method

705b910

Aligns type hint with build_modal_image signature for consistency.

Simplify memory conversion using walrus operator

d19ffeb

Reduces verbosity by combining assignment and conditional check.

Use consistent truthy checks for None handling

9e91e94

Aligns with existing pattern used throughout Modal integration.

Extract timeout default into named constant

fdbdea8

Improves maintainability by defining the value in a single location.

Add unit tests for complex helper functions

4dcf11b

Update docs page

6ffa074

better error handling

00232ea

Remove excess comments

ec878bc

Add type hint for resource_settings parameter

37fd631

Improves type safety by explicitly typing the resource_settings parameter as ResourceSettings in _compute_modal_gpu_arg method.

Enforce timeout constraints in ModalStepOperatorSettings

72d70e2

Adds Pydantic field constraints to enforce the timeout range (1-86400 seconds) that was previously only documented. Uses ge=1 and le=DEFAULT_TIMEOUT_SECONDS to validate input at instantiation time.

Remove excessive comment

ccef58b

strickvl added enhancement New feature or request internal To filter out internal PRs and issues dependencies Pull requests that update a dependency file labels Oct 3, 2025

github-advanced-security bot found potential problems Oct 3, 2025

View reviewed changes

src/zenml/integrations/modal/utils.py Dismissed Show dismissed Hide dismissed

strickvl requested a review from Copilot October 3, 2025 16:48

Copilot AI reviewed Oct 3, 2025

View reviewed changes

src/zenml/integrations/modal/step_operators/modal_step_operator.py Show resolved Hide resolved

src/zenml/integrations/modal/step_operators/modal_step_operator.py Outdated Show resolved Hide resolved

strickvl requested a review from schustmi October 3, 2025 16:50

Refactor the get_gpu_values out to utils

b2154d6

strickvl added the run-slow-ci label Oct 5, 2025

strickvl removed the request for review from schustmi October 6, 2025 07:55

strickvl and others added 4 commits October 6, 2025 10:07

Fix darglint docstring errors in Modal integration

e4d4af0

Add missing parameter documentation to helper functions and exception documentation to the launch method to satisfy darglint requirements.

Merge branch 'develop' into feature/update-modal-step-operator

e37f6da

Small changes

7bcf68c

mypy fix

f4c753a

strickvl requested a review from schustmi October 6, 2025 11:41

strickvl and others added 4 commits October 6, 2025 13:46

Tests go in the right folders

ae065d1

Add licenses

684cd2c

Remove dumb tests

fb4dcb7

Merge branch 'develop' into feature/update-modal-step-operator

cff126d

schustmi requested changes Oct 7, 2025

View reviewed changes

strickvl added 7 commits October 7, 2025 10:09

Revert Optional setting for environment

86ec76c

Remove unneeded guardrail

0599ddf

Update comments

c54b2b4

Remove extra modal pip install

24bfd63

Adapt CLI tests for Click 8.2 compatibility

1147e99

schustmi reviewed Oct 7, 2025

View reviewed changes

tests/integration/functional/cli/utils.py Outdated Show resolved Hide resolved

strickvl added 3 commits October 7, 2025 11:48

Adapt CLI tests for Click 8.2 compatibility

ad071ed

Merge branch 'feature/update-modal-step-operator' of github.com:zenml…

65129f7

…-io/zenml into feature/update-modal-step-operator

Merge remote-tracking branch 'origin/develop' into feature/update-mod…

05b0dde

…al-step-operator

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update Modal step operator #4038

Update Modal step operator #4038

Uh oh!

strickvl commented Oct 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

github-actions bot commented Oct 3, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update Modal step operator #4038

Are you sure you want to change the base?

Update Modal step operator #4038

Uh oh!

Conversation

strickvl commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation Link Check Results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

strickvl commented Oct 3, 2025 •

edited

Loading

github-actions bot commented Oct 3, 2025 •

edited

Loading