Open
Description
Currently, a user can attempt to run a specific task up to a maximum of 6 times. It would be beneficial to make this value configurable.
In our use case, we are working on integrating Argo retries with Metaflow’s retried Argo workflows. This environment variable would allow us to set a limit on how many times a user can retry an Argo workflow.
That said, beyond our specific use case, adding this configuration flexibility would be generally useful.
Current Behaviour
import pandas as pd
from metaflow import (
FlowSpec,
Parameter,
card,
project,
step,
retry
)
@project(name="dummy_project")
class HelloWorld(FlowSpec):
force_error = Parameter("force-error", type=bool, default=False)
@card
@step
def start(self):
print("something")
self.next(self.end)
@card
@retry(times=10)
@step
def end(self):
if self.force_error:
raise Exception("Testing errors in metaflow")
print(f"the data artifact is: {self.my_var}")
if __name__ == "__main__":
HelloWorld()
- Running the above flow locally via
python hello_world.py run
throws the following exception
Metaflow 2.14.0 executing HelloWorld for user:j.kollipara
Project: dummy_project, Branch: user.j.kollipara
Validating your flow...
The graph looks good!
Running pylint...
Pylint is happy!
Flow failed:
The maximum number of retries is @retry(times=4).
error: Recipe `_poetry-run` failed with exit code 1
Source code of the above error:
metaflow/metaflow/plugins/retry_decorator.py
Lines 30 to 37 in 5c960ea
Proposed Behaviour
Setting METAFLOW_MAX_ATTEMPTS=12
would allow users to run the above flow.
Metadata
Metadata
Assignees
Labels
No labels
Activity
MAX_ATTEMPTS
configurable #2275dhpikolo commentedon Feb 18, 2025
I have already put up a PR with the proposed change, let me know what you guys would think of it.
MAX_ATTEMPTS
configurable #2279MAX_ATTEMPTS
configurable #2279dhpikolo commentedon Feb 19, 2025
Created a new PR, since the old PR was based on development branch.