-
Notifications
You must be signed in to change notification settings - Fork 323
Flytekit: Rename map_task to map, replace min_successes and min_success_ratio with tolerance, rename max_parallelism to concurrency #3107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Code Review Agent Run #d47fe6Actionable Suggestions - 13
Additional Suggestions - 10
Review Details
|
Changelist by BitoThis pull request implements the following key changes.
|
@@ -1,7 +1,7 @@ | |||
import tempfile | |||
from pathlib import Path | |||
|
|||
from flytekit import FlyteDirectory, FlyteFile, map_task, task, workflow | |||
from flytekit import FlyteDirectory, FlyteFile, map, task, workflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider if replacing map_task
with map
is intentional as they might have different functionality in the Flyte framework. map_task
is typically used for task parallelization while map
might have different semantics.
Code suggestion
Check the AI-generated fix before applying
from flytekit import FlyteDirectory, FlyteFile, map, task, workflow | |
from flytekit import FlyteDirectory, FlyteFile, map_task, task, workflow |
Code Review Run #d47fe6
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
@@ -26,6 +26,6 @@ def list_dir(dir: FlyteDirectory) -> list[FlyteFile]: | |||
def wf() -> list[str]: | |||
tmpdir = setup() | |||
files = list_dir(dir=tmpdir) | |||
return map_task(read_file)(file=files) | |||
return map(read_file)(file=files) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using map_task
instead of map
for task mapping operations in Flytekit workflows. The map
function may not provide the same task-level parallelization and execution guarantees as map_task
.
Code suggestion
Check the AI-generated fix before applying
return map(read_file)(file=files) | |
return map_task(read_file)(file=files) |
Code Review Run #d47fe6
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
@@ -414,7 +414,7 @@ def create_sd() -> StructuredDataset: | |||
def test_map_over_notebook_task(): | |||
@workflow | |||
def wf(a: float) -> typing.List[float]: | |||
return map_task(nb_sub_task)(a=[a, a]) | |||
return map(nb_sub_task)(a=[a, a]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using map_task
instead of map
for mapping over notebook tasks. The map
function may not handle notebook task specific requirements correctly.
Code suggestion
Check the AI-generated fix before applying
return map(nb_sub_task)(a=[a, a]) | |
return map_task(nb_sub_task)(a=[a, a]) |
Code Review Run #d47fe6
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
from flytekit._version import __version__ | ||
from flytekit.configuration import Config | ||
from flytekit.core.array_node_map_task import map_task | ||
from flytekit.core.array_node_map_task import map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider keeping both map_task
and map
imports to maintain backward compatibility. The alias is defined later but importing directly as map
may break existing code that uses map_task
.
Code suggestion
Check the AI-generated fix before applying
from flytekit.core.array_node_map_task import map | |
from flytekit.core.array_node_map_task import map_task |
Code Review Run #d47fe6
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
@@ -369,11 +370,12 @@ def _raw_execute(self, **kwargs) -> Any: | |||
return outputs | |||
|
|||
|
|||
def map_task( | |||
def map( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider keeping the original function name map_task
instead of renaming to map
as it could conflict with Python's built-in map
function and cause confusion. The original name was more descriptive of the function's purpose.
Code suggestion
Check the AI-generated fix before applying
def map( | |
def map_task( |
Code Review Run #d47fe6
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
@@ -63,7 +63,7 @@ def say_hello(name: str) -> str: | |||
|
|||
@workflow | |||
def wf() -> List[str]: | |||
return map_task(say_hello)(name=["abc", "def"]) | |||
return map(say_hello)(name=["abc", "def"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider if using map()
instead of map_task()
is intentional as it changes the behavior from using Flyte's map task functionality to Python's built-in map()
.
Code suggestion
Check the AI-generated fix before applying
return map(say_hello)(name=["abc", "def"]) | |
return map_task(say_hello)(name=["abc", "def"]) |
Code Review Run #d47fe6
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
@@ -575,7 +575,7 @@ def say_hello(name: str) -> str: | |||
for index, map_input_str in enumerate(list_strs): | |||
monkeypatch.setenv("BATCH_JOB_ARRAY_INDEX_VAR_NAME", "name") | |||
monkeypatch.setenv("name", str(index)) | |||
t = map_task(say_hello) | |||
t = map(say_hello) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider if using map()
instead of map_task()
is intentional as this could change the behavior of task mapping functionality.
Code suggestion
Check the AI-generated fix before applying
t = map(say_hello) | |
t = map_task(say_hello) |
Code Review Run #d47fe6
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
@@ -410,7 +410,7 @@ def test_serialization_metadata(serialization_settings): | |||
def t1(a: int) -> int: | |||
return a + 1 | |||
|
|||
arraynode_maptask = map_task(t1, metadata=TaskMetadata(retries=2)) | |||
arraynode_maptask = map(t1, metadata=TaskMetadata(retries=2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider if changing from map_task
to map
could impact backward compatibility. The function name change from map_task
to map
may affect existing code that imports and uses the original function name.
Code suggestion
Check the AI-generated fix before applying
arraynode_maptask = map(t1, metadata=TaskMetadata(retries=2)) | |
# Maintain both for backward compatibility | |
arraynode_maptask = map_task(t1, metadata=TaskMetadata(retries=2)) |
Code Review Run #d47fe6
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
t1 = map(say_hello, **kwargs1) | ||
t2 = map(say_hello, **kwargs2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider if replacing map_task
with map
is intentional as this changes the function being called which could affect functionality. The map_task
decorator appears to be imported but not used after this change.
Code suggestion
Check the AI-generated fix before applying
t1 = map(say_hello, **kwargs1) | |
t2 = map(say_hello, **kwargs2) | |
t1 = map_task(say_hello, **kwargs1) | |
t2 = map_task(say_hello, **kwargs2) |
Code Review Run #d47fe6
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
@@ -316,7 +316,7 @@ def test_bounded_inputs_vars_order(serialization_settings): | |||
def task1(a: int, b: float, c: str) -> str: | |||
return f"{a} - {b} - {c}" | |||
|
|||
mt = map_task(functools.partial(task1, c=1.0, b="hello", a=1)) | |||
mt = map(functools.partial(task1, c=1.0, b="hello", a=1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using map_task()
instead of map()
as it appears to be the intended function based on the test context and imports. Using map()
could lead to unexpected behavior since it's a built-in Python function.
Code suggestion
Check the AI-generated fix before applying
mt = map(functools.partial(task1, c=1.0, b="hello", a=1)) | |
mt = map_task(functools.partial(task1, c=1.0, b="hello", a=1)) |
Code Review Run #d47fe6
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
@@ -492,7 +492,7 @@ def test_supported_node_type(): | |||
def test_task(): | |||
... | |||
|
|||
map_task(test_task) | |||
map(test_task) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function call has been changed from map_task(test_task)
to map(test_task)
. This could potentially cause confusion with Python's built-in map()
function. Consider using the imported map_task
decorator/function to maintain clarity and avoid potential naming conflicts.
Code suggestion
Check the AI-generated fix before applying
map(test_task) | |
map_task(test_task) |
Code Review Run #d47fe6
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
@@ -533,7 +533,7 @@ def consume_directories(dirs: List[FlyteDirectory]): | |||
for path_info, other_info in d.crawl(): | |||
print(path_info) | |||
|
|||
mt = map_task(generate_directory, min_success_ratio=0.1) | |||
mt = map(generate_directory, min_success_ratio=0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider if using map()
instead of map_task()
is intentional as it may change the expected behavior. The map_task()
function is typically used for array node map tasks in Flytekit.
Code suggestion
Check the AI-generated fix before applying
mt = map(generate_directory, min_success_ratio=0.1) | |
mt = map_task(generate_directory, min_success_ratio=0.1) |
Code Review Run #d47fe6
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
@@ -575,7 +575,7 @@ def say_hello(name: str) -> str: | |||
for index, map_input_str in enumerate(list_strs): | |||
monkeypatch.setenv("BATCH_JOB_ARRAY_INDEX_VAR_NAME", "name") | |||
monkeypatch.setenv("name", str(index)) | |||
t = map_task(say_hello) | |||
t = map(say_hello) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using map_task
instead of map
as it appears to be the intended decorator based on the imports and test context. The map
function could be confused with Python's built-in map
function.
Code suggestion
Check the AI-generated fix before applying
t = map(say_hello) | |
t = map_task(say_hello) |
Code Review Run #d47fe6
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
Code Review Agent Run #99b31dActionable Suggestions - 8
Additional Suggestions - 10
Review Details
|
@@ -315,7 +316,7 @@ def test_bounded_inputs_vars_order(serialization_settings): | |||
def task1(a: int, b: float, c: str) -> str: | |||
return f"{a} - {b} - {c}" | |||
|
|||
mt = map_task(functools.partial(task1, c=1.0, b="hello", a=1)) | |||
mt = map(functools.partial(task1, c=1.0, b="hello", a=1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function call parameters c=1.0, b="hello", a=1
appear to have mismatched types with the task definition. The task expects a: int, b: float, c: str
but receives c
as float, b
as string, and a
as int. Consider adjusting the parameter types to match the task signature.
Code suggestion
Check the AI-generated fix before applying
mt = map(functools.partial(task1, c=1.0, b="hello", a=1)) | |
mt = map(functools.partial(task1, c="1.0", b=1.0, a=1)) |
Code Review Run #99b31d
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
@@ -1551,7 +1551,7 @@ def _execute( | |||
annotations=options.annotations, | |||
raw_output_data_config=options.raw_output_data_config, | |||
auth_role=None, | |||
max_parallelism=options.max_parallelism, | |||
concurrency=options.concurrency, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider verifying if renaming max_parallelism
to concurrency
maintains backward compatibility. This change could potentially break existing code that relies on the max_parallelism
parameter.
Code suggestion
Check the AI-generated fix before applying
concurrency=options.concurrency, | |
concurrency=options.max_parallelism if hasattr(options, 'max_parallelism') | |
else options.concurrency, | |
# TODO: Remove max_parallelism support in next major version | |
# Deprecated in favor of concurrency parameter |
Code Review Run #99b31d
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
@@ -273,7 +273,7 @@ def t1(a: str) -> str: | |||
|
|||
@workflow | |||
def my_wf(a: typing.List[str]) -> typing.List[str]: | |||
mappy = map_task(t1) | |||
mappy = map(t1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using map_task
instead of map
as it appears to be the intended function based on the test context. The map
function may not provide the same task mapping functionality needed for workflow testing.
Code suggestion
Check the AI-generated fix before applying
mappy = map(t1) | |
mappy = map_task(t1) |
Code Review Run #99b31d
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
@@ -726,7 +726,7 @@ def t1(x: int, y: int) -> int: | |||
|
|||
@workflow | |||
def w() -> int: | |||
return map_task(partial(t1, y=2))(x=[1, 2, 3]) | |||
return map(partial(t1, y=2))(x=[1, 2, 3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using map_task
instead of map
as it appears to be testing map task functionality based on the test name and context.
Code suggestion
Check the AI-generated fix before applying
return map(partial(t1, y=2))(x=[1, 2, 3]) | |
return map_task(partial(t1, y=2))(x=[1, 2, 3]) |
Code Review Run #99b31d
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
m1 = map(functools.partial(task1, c=param_c))(a=param_a, b=param_b) | ||
m2 = map(functools.partial(task2, c=param_c))(a=param_a, b=param_b) | ||
m3 = map(functools.partial(task3, c=param_c))(a=param_a, b=param_b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using map_task
instead of map
for consistency with the test name and module being tested (test_array_node_map_task.py
). The test appears to be validating array node map task functionality.
Code suggestion
Check the AI-generated fix before applying
m1 = map(functools.partial(task1, c=param_c))(a=param_a, b=param_b) | |
m2 = map(functools.partial(task2, c=param_c))(a=param_a, b=param_b) | |
m3 = map(functools.partial(task3, c=param_c))(a=param_a, b=param_b) | |
m1 = map_task(functools.partial(task1, c=param_c))(a=param_a, b=param_b) | |
m2 = map_task(functools.partial(task2, c=param_c))(a=param_a, b=param_b) | |
m3 = map_task(functools.partial(task3, c=param_c))(a=param_a, b=param_b) |
Code Review Run #99b31d
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
<<<<<<< Updated upstream | ||
======= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a merge conflict marker in the code. The lines <<<<<<< Updated upstream
and =======
are Git merge conflict markers that should be resolved before committing.
Code suggestion
Check the AI-generated fix before applying
<<<<<<< Updated upstream | |
======= |
Code Review Run #9482f9
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
def test_map_task_override(serialization_settings): | ||
@task | ||
def my_mappable_task(a: int) -> typing.Optional[str]: | ||
return str(a) | ||
|
||
>>>>>>> Stashed changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a Git merge conflict marker (>>>>>>> Stashed changes
) in the code. This needs to be resolved before the code can be properly merged.
Code suggestion
Check the AI-generated fix before applying
-<<<<<<< Updated upstream
-=======
-def test_map_task_override(serialization_settings):
- @task
- def my_mappable_task(a: int) -> typing.Optional[str]:
- return str(a)
-
->>>>>>> Stashed changes
Code Review Run #9482f9
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
- Rename map_task to map for simpler API - Replace min_successes/min_success_ratio with tolerance parameter - Rename max_parallelism to concurrency for consistency Signed-off-by: Chih Tsung Lu <[email protected]>
Signed-off-by: Chih Tsung Lu <[email protected]>
Signed-off-by: lu00122 <[email protected]> Signed-off-by: Chih Tsung Lu <[email protected]>
Signed-off-by: Chih Tsung Lu <[email protected]>
Code Review Agent Run #58bfccActionable Suggestions - 2
Review Details
|
…yte_idl Signed-off-by: Chih Tsung Lu <[email protected]>
Signed-off-by: Chih Tsung Lu <[email protected]>
Signed-off-by: Chih Tsung Lu <[email protected]>
Signed-off-by: Chih Tsung Lu <[email protected]>
Signed-off-by: Chih Tsung Lu <[email protected]> (cherry picked from commit a62560a9235f2591a5530cd1834b8e3f5fa9c492)
Signed-off-by: Chih Tsung Lu <[email protected]>
if "concurrency" in kwargs: | ||
kwargs["max_parallelism"] = kwargs.pop("concurrency") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code sets self._max_parallelism
directly from kwargs["max_parallelism"]
but doesn't handle the case where concurrency
is used. This creates inconsistent behavior between the two parameter options.
Code suggestion
Check the AI-generated fix before applying
if "concurrency" in kwargs: | |
kwargs["max_parallelism"] = kwargs.pop("concurrency") | |
if "concurrency" in kwargs: | |
kwargs["max_parallelism"] = kwargs.pop("concurrency") | |
self._max_parallelism = kwargs["max_parallelism"] |
Code Review Run #58bfcc
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
max_parallelism: int = make_click_option_field( | ||
click.Option( | ||
param_decls=["--max-parallelism"], | ||
required=False, | ||
type=int, | ||
show_default=True, | ||
help="[Deprecated] Use --concurrency instead", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter name has been changed from --max-parallelism
to --concurrency
, but there's still a max_parallelism
parameter defined at line 269 that uses the old flag name. This creates a situation where both flags are available but they likely control the same functionality. Consider updating the code to handle both parameters appropriately, perhaps by having the deprecated max_parallelism
parameter set the concurrency
value when provided.
Code suggestion
Check the AI-generated fix before applying
@@ -268,12 +268,19 @@
)
max_parallelism: int = make_click_option_field(
click.Option(
param_decls=["--max-parallelism"],
required=False,
type=int,
show_default=True,
help="[Deprecated] Use --concurrency instead",
+ callback=lambda ctx, param, value: _handle_max_parallelism(ctx, value),
)
)
+
+def _handle_max_parallelism(ctx, value):
+ """Handle the deprecated max_parallelism parameter by setting concurrency if provided."""
+ if value is not None and 'concurrency' not in ctx.params:
+ ctx.params['concurrency'] = value
+ return value
+
Code Review Run #58bfcc
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
Code Review Agent Run #184856Actionable Suggestions - 2
Review Details
|
else: | ||
raise TypeError("tolerance must be float or int") | ||
|
||
final_min_ratio = computed_min_ratio if min_success_ratio is None else min_success_ratio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable final_min_ratio
is assigned min_success_ratio
even when it's None
. This could lead to unexpected behavior since the default value for min_success_ratio
has been changed from 1.0
to None
. Consider ensuring a non-None value is always assigned.
Code suggestion
Check the AI-generated fix before applying
final_min_ratio = computed_min_ratio if min_success_ratio is None else min_success_ratio | |
final_min_ratio = min_success_ratio if min_success_ratio is not None else computed_min_ratio |
Code Review Run #184856
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
@@ -367,7 +379,14 @@ def __init__( | |||
self._labels = labels | |||
self._annotations = annotations | |||
self._raw_output_data_config = raw_output_data_config | |||
self._max_parallelism = max_parallelism | |||
self._concurrency = concurrency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The max_parallelism
property is now returning self._concurrency
instead of the original self._max_parallelism
. This suggests that the attribute name has changed, but there might be a missing initialization or migration of the old attribute value. Consider ensuring that existing _max_parallelism
values are properly migrated to _concurrency
during initialization.
Code suggestion
Check the AI-generated fix before applying
self._concurrency = concurrency | |
# Ensure backward compatibility by using max_parallelism if provided | |
if hasattr(self, '_max_parallelism') and self._max_parallelism is not None: | |
self._concurrency = self._max_parallelism | |
else: | |
self._concurrency = concurrency |
Code Review Run #184856
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
Code Review Agent Run #032d70Actionable Suggestions - 0Review Details
|
Signed-off-by: Chih Tsung Lu <[email protected]>
Signed-off-by: Chih Tsung Lu <[email protected]>
Signed-off-by: Chih Tsung Lu <[email protected]>
Code Review Agent Run #1a2fb7Actionable Suggestions - 0Review Details
|
Code Review Agent Run #31323fActionable Suggestions - 0Review Details
|
Tracking issue
Related to flyteorg/flyte#6139
Why are the changes needed?
The current Flytekit has several areas that could be improved for a better developer experience:
map_task
name is unnecessarily verbose when imported via the recommendedimport flytekit as fl
min_successes
andmin_success_ratio
) are powerful but overly verbosemap_task
'sconcurrency
parameterWhat changes were proposed in this pull request?
Rename
map_task
tomap
map
, it's acceptable since we recommend usingimport flytekit as fl
Simplify failure tolerance parameters
min_successes
andmin_success_ratio
tolerance
parameter that accepts bothfloat
andint
typesStandardize parallelism parameter
max_parallelism
argument in workflow and LaunchPlanconcurrency
parameter to matchmap_task
's parameterHow was this patch tested?
Ran tests with the command: make test
Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link
Summary by Bito
This PR standardizes API naming by changing 'map_task' to 'map' and 'max_parallelism' to 'concurrency', while consolidating failure tolerance parameters. It transitions from legacy 'agent' nomenclature to a 'connector' paradigm while maintaining backward compatibility with deprecation warnings. The changes enhance execution metric retrieval, asynchronous task handling, resource validation, and file serialization for a more intuitive developer experience. It also improves exception handling and introduces geospatial support.Unit tests added: False
Estimated effort to review (1-5, lower is better): 5