-
Notifications
You must be signed in to change notification settings - Fork 548
Update Modal step operator #4038
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Inline GPU configuration logic directly into the step operator instead of using a utility function. The logic is simple enough (10 lines of string formatting) that a shared utility adds unnecessary indirection. Changes: - Inline GPU value construction in modal_step_operator.py - Remove get_gpu_values utility function from utils.py - Remove test file that only tested trivial string formatting This addresses the TODO comment to refactor and remove the get_gpu_values helper function.
Aligns type hint with build_modal_image signature for consistency.
Reduces verbosity by combining assignment and conditional check.
Aligns with existing pattern used throughout Modal integration.
Improves maintainability by defining the value in a single location.
Previously, setting gpu_count > 0 without specifying a GPU type would silently run on CPU with no warning. This adds comprehensive validation that raises clear configuration errors with actionable guidance. Changes: - Add _compute_modal_gpu_arg() helper to validate and format GPU settings - Raise StackComponentInterfaceError when gpu_count > 0 but gpu is None - Warn and default to 1 GPU when gpu is set but gpu_count is 0 - Validate gpu_count is non-negative integer - Normalize whitespace in GPU type strings - Add missing List import to utils.py
Previously, setup_modal_client always used the component-level modal_environment config, preventing per-step overrides via settings. Now prioritizes settings.modal_environment when provided, falling back to component config for backward compatibility.
Improves type safety by explicitly typing the resource_settings parameter as ResourceSettings in _compute_modal_gpu_arg method.
Adds Pydantic field constraints to enforce the timeout range (1-86400 seconds) that was previously only documented. Uses ge=1 and le=DEFAULT_TIMEOUT_SECONDS to validate input at instantiation time.
Changes command execution from shell-based to direct varargs invocation, eliminating shell metacharacter issues and removing the implicit bash dependency. Adds proper argument quoting via shlex.join() as a fallback for backward compatibility with older Modal client versions that don't support varargs command passing.
Improves error handling during the creation of a Modal sandbox by adding a fallback mechanism for command execution. If the direct varargs invocation fails due to a TypeError, the code now falls back to a shell-quoted invocation using shlex.join() to ensure compatibility with older Modal client versions. This change enhances robustness and maintains backward compatibility while preserving argument safety.
Documentation Link Check Results❌ Absolute links check failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the Modal step operator integration to use Modal>=1 and introduces significant improvements including proper utility functions, enhanced GPU argument handling, and comprehensive unit tests.
Key changes:
- Updates Modal dependency from
modal>=0.64.49,<1
tomodal>=1
- Refactors shared functionality into a new
utils.py
module with authentication setup, image building, and stack validation - Replaces the simplistic
get_gpu_values
function with robust_compute_modal_gpu_arg
method that validates GPU configurations and prevents silent CPU fallbacks
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
src/zenml/integrations/modal/__init__.py |
Updates Modal dependency requirement to version >=1 |
src/zenml/integrations/modal/utils.py |
New utility module with authentication setup, image building, and stack validation functions |
src/zenml/integrations/modal/step_operators/modal_step_operator.py |
Major refactor using new utilities, improved GPU handling, and better error reporting |
src/zenml/integrations/modal/flavors/modal_step_operator_flavor.py |
Enhanced configuration with detailed field descriptions, validation, and new authentication fields |
docs/book/component-guide/step-operators/modal.md |
Updated documentation with clearer examples and GPU configuration guidance |
tests/integration/integrations/modal/step_operators/ |
New comprehensive test suite covering utilities, GPU configurations, and flavor validation |
Comments suppressed due to low confidence (2)
docs/book/component-guide/step-operators/modal.md:1
- The scarf tracking image has been removed from the documentation. Verify this removal is intentional and aligns with documentation standards.
---
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/zenml/integrations/modal/step_operators/modal_step_operator.py
Outdated
Show resolved
Hide resolved
Add missing parameter documentation to helper functions and exception documentation to the launch method to satisfy darglint requirements.
src/zenml/integrations/modal/step_operators/modal_step_operator.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/modal/step_operators/modal_step_operator.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/modal/step_operators/modal_step_operator.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/modal/step_operators/modal_step_operator.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/modal/step_operators/modal_step_operator.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/modal/step_operators/modal_step_operator.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/modal/step_operators/modal_step_operator.py
Outdated
Show resolved
Hide resolved
Rename zenml_image to modal_image to accurately reflect that the function returns a Modal-specific image object, not a ZenML construct. Update error message to describe actual failure scenarios: Modal's inability to access the container registry or extend the Docker image, rather than suggesting the image might not exist (which is incorrect since ZenML just built and pushed it successfully).
Move the memory_int calculation from inside the async with app.run() context to outside, grouping it with other resource preparations like gpu_values. This is a pure data transformation that doesn't require Modal's runtime context, improving code organization and reducing unnecessary nesting.
…-io/zenml into feature/update-modal-step-operator
#3733 is taking a while to get merged, so this takes the Step Operator improvements and builds on them.
Notably, our Modal integration depedency is now set to
modal>=1
. (Our Modal Deployer will also have this as its base, so I split these changes out into its own PR in the hope it can be merged sooner).Otherwise this PR does some cleanup and adds some unit tests (the bulk of the code additions).
(It also addresses some fixes to the CI that are unrelated to this PR).