feat: add --enable-schedule/--no-enable-schedule flag to disable schedules during deployment#3102
feat: add --enable-schedule/--no-enable-schedule flag to disable schedules during deployment#3102
Conversation
…dules during deployment When deploying dev/test branches of production flows that use @schedule, the schedule gets deployed too, potentially interfering with production. This adds a --no-enable-schedule flag to both `argo-workflows create` and `step-functions create` that deploys the flow with the schedule suspended (Argo) or skipped (SFN). Also supports METAFLOW_SCHEDULE_DISABLED=1 env var for CI/CD use. CLI flag takes precedence over the env var. Closes #1169
Greptile SummaryThis PR adds
Confidence Score: 2/5Not safe to merge — the SFN path crashes at runtime for the primary use-case (a scheduled flow deployed with --no-enable-schedule). A P0 AttributeError crash exists on the SFN code path for the exact scenario this feature targets. Two additional P1 issues (existing EventBridge rules not deactivated on re-deployment, contradictory CLI output) further reduce confidence. step_functions.py and step_functions_cli.py require fixes before merging; argo_workflows_cli.py needs the contradictory output resolved. Important Files Changed
Reviews (1): Last reviewed commit: "feat: add --enable-schedule/--no-enable-..." | Re-trigger Greptile |
| # Resolve schedule_disabled: CLI flag takes precedence, then env var | ||
| schedule_disabled = _is_schedule_disabled(enable_schedule) | ||
|
|
||
| flow.schedule(schedule_disabled=schedule_disabled) | ||
| obj.echo("What will trigger execution of the workflow:", bold=True) |
There was a problem hiding this comment.
AttributeError crash when --no-enable-schedule is used with a scheduled flow
When schedule_disabled=True and the flow has a @schedule decorator, flow.schedule() returns early (line 147 of step_functions.py) without assigning self.event_bridge_rule. The very next call to flow.trigger_explanation() at line 269 checks if self._cron: (which is True) and then unconditionally accesses self.event_bridge_rule — raising AttributeError: 'StepFunctions' object has no attribute 'event_bridge_rule'. The primary use-case of this feature crashes at every invocation for SFN flows with a schedule.
The minimal fix is to initialise self.event_bridge_rule = None in StepFunctions.__init__ and guard trigger_explanation() against None, or suppress the schedule-based explanation when schedule_disabled is True.
| # Resolve schedule_disabled: CLI flag takes precedence, then env var | |
| schedule_disabled = _is_schedule_disabled(enable_schedule) | |
| flow.schedule(schedule_disabled=schedule_disabled) | |
| obj.echo("What will trigger execution of the workflow:", bold=True) | |
| flow.schedule(schedule_disabled=schedule_disabled) | |
| obj.echo("What will trigger execution of the workflow:", bold=True) | |
| if schedule_disabled: | |
| obj.echo( | |
| "The schedule for this workflow has been *disabled* " | |
| "(EventBridge rule will not be created).", | |
| indent=True, | |
| ) | |
| else: | |
| obj.echo(flow.trigger_explanation(), indent=True) |
| def schedule(self, schedule_disabled=False): | ||
| # Scheduling is currently enabled via AWS Event Bridge. | ||
| # If no cron schedule is defined, or schedule is explicitly disabled, | ||
| # nothing to do. | ||
| if not self._cron or schedule_disabled: | ||
| return |
There was a problem hiding this comment.
Existing EventBridge rule is not disabled on re-deployment
When schedule_disabled=True, the method returns immediately without touching EventBridge. If the flow was previously deployed with an active EventBridge rule, that rule is left intact and the cron continues to fire. Argo's schedule() sidesteps this by always calling schedule_workflow_template (passing None for schedule/timezone), which updates the CronWorkflow to suspend: true. SFN has no equivalent cleanup, so re-deploying with --no-enable-schedule gives users a false sense of security — the schedule is not actually deactivated.
Consider calling EventBridgeClient(self.name).delete() (or an equivalent disable operation) when schedule_disabled=True and an existing rule is present, to achieve parity with Argo's behaviour.
| obj.echo("What will trigger execution of the workflow:", bold=True) | ||
| if schedule_disabled: | ||
| obj.echo( | ||
| "The schedule for this workflow has been *disabled* " | ||
| "(deployed in suspended state).", | ||
| indent=True, | ||
| ) | ||
| obj.echo(flow.trigger_explanation(), indent=True) | ||
|
|
There was a problem hiding this comment.
Contradictory output when schedule is disabled
When schedule_disabled=True and the flow has a @schedule decorator, both messages are printed:
"The schedule for this workflow has been *disabled* (deployed in suspended state)."trigger_explanation()→"This workflow triggers automatically via the CronWorkflow *<name>*."
These directly contradict each other. The same issue exists in the SFN CLI (step_functions_cli.py lines 265-269). The call to trigger_explanation() should be guarded so it is not printed when the schedule is disabled.
| obj.echo("What will trigger execution of the workflow:", bold=True) | |
| if schedule_disabled: | |
| obj.echo( | |
| "The schedule for this workflow has been *disabled* " | |
| "(deployed in suspended state).", | |
| indent=True, | |
| ) | |
| obj.echo(flow.trigger_explanation(), indent=True) | |
| flow.schedule(schedule_disabled=schedule_disabled) | |
| obj.echo("What will trigger execution of the workflow:", bold=True) | |
| if schedule_disabled: | |
| obj.echo( | |
| "The schedule for this workflow has been *disabled* " | |
| "(deployed in suspended state).", | |
| indent=True, | |
| ) | |
| else: | |
| obj.echo(flow.trigger_explanation(), indent=True) |
| headline = "Unsupported version of Python" | ||
|
|
||
|
|
||
| def _is_schedule_disabled(enable_schedule): | ||
| """Resolve whether the schedule should be disabled. | ||
|
|
||
| CLI flag (--enable-schedule / --no-enable-schedule) takes precedence. | ||
| If not specified, falls back to the METAFLOW_SCHEDULE_DISABLED env var / config. | ||
| """ | ||
| if enable_schedule is not None: | ||
| return not enable_schedule |
There was a problem hiding this comment.
Duplicated
_is_schedule_disabled helper
_is_schedule_disabled is defined identically in argo_workflows_cli.py and step_functions_cli.py. Consider extracting it to a shared location (e.g. a small utility in metaflow/metaflow_config.py or a common CLI helpers module) and importing it from both CLI files to keep the logic in one place.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
|
||
| ### | ||
| # Schedule configuration | ||
| ### | ||
| # When set to True, deploys the workflow with the schedule disabled (suspended). | ||
| # Useful for deploying dev/test branches without activating production schedules. | ||
| SCHEDULE_DISABLED = from_conf("SCHEDULE_DISABLED", False) | ||
| ### |
There was a problem hiding this comment.
Missing blank line and misplaced section
The new SCHEDULE_DISABLED config block is placed immediately after the SFN-specific settings with no blank line before the ### Kubernetes configuration ### section. Since this setting applies to both SFN and Argo, it would read more clearly if it were placed in a more neutral location and had the standard blank separator line.
| ### | |
| # Schedule configuration | |
| ### | |
| # When set to True, deploys the workflow with the schedule disabled (suspended). | |
| # Useful for deploying dev/test branches without activating production schedules. | |
| SCHEDULE_DISABLED = from_conf("SCHEDULE_DISABLED", False) | |
| ### | |
| SCHEDULE_DISABLED = from_conf("SCHEDULE_DISABLED", False) | |
| ### | |
| # Kubernetes configuration | |
| ### |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
#3122) ## PR Type - [x] Bug fix ## Summary When `@schedule` is present but all scheduling flags are falsy (e.g. `@schedule(daily=False)`), deploying to Argo Workflows crashes with `AttributeError: 'NoneType' object has no attribute 'split'` and no WorkflowTemplate is created. This fix makes the deploy succeed and skip CronWorkflow creation, which is the expected behaviour when no schedule is configured. ## Issue Fixes #3121 ## Reproduction **Runtime:** argo **Commands to run:** ```bash python myflow.py argo-workflows create ``` Where `myflow.py` contains: ```python from metaflow import FlowSpec, step, schedule @schedule(daily=False) class MyFlow(FlowSpec): @step def start(self): self.next(self.end) @step def end(self): pass if __name__ == "__main__": MyFlow() ``` **Where evidence shows up:** parent console / `argo-workflows create` output <details> <summary>Before (error / log snippet)</summary> ``` AttributeError: 'NoneType' object has no attribute 'split' File ".../metaflow/plugins/argo/argo_workflows.py", line 463, in _get_schedule return " ".join(schedule.schedule.split()[:5]), schedule.timezone ``` </details> <details> <summary>After (evidence that fix works)</summary> ``` Workflow template 'MyFlow' created successfully. No CronWorkflow created (no schedule configured). ``` </details> ## Root Cause `ScheduleDecorator.flow_init` sets `self.schedule = None` when all scheduling flags are falsy (`schedule_decorator.py:49`). `_get_schedule()` in `argo_workflows.py` guarded against the decorator being absent (`if schedule:`), but not against `schedule.schedule` being `None`. Calling `.split()` on `None` raises `AttributeError`. Separately, `trigger_explanation()` checked decorator presence rather than `self._schedule`, so it would incorrectly report CronWorkflow-based triggering even when no schedule was resolved. ## Why This Fix Is Correct `_get_schedule()` now returns `(None, None)` when `schedule.schedule is None`, treating it identically to the decorator being absent. The rest of the deploy path already handles `self._schedule is None` correctly — no CronWorkflow is created. `trigger_explanation()` now checks `self._schedule is not None`, which is the actual resolved value, rather than decorator presence. ## Failure Modes Considered 1. **Flows with a real schedule are unaffected** — the `schedule.schedule is None` guard is only entered when all flags are falsy; any valid cron string, `daily=True`, `hourly=True`, or `weekly=True` sets a non-None value and takes the existing return path. 2. **Backward compatibility** — the previous behaviour for the `@schedule(daily=False)` case was a crash, so changing it to a clean no-op is strictly an improvement with no risk of regression. ## Tests - [x] Unit tests added/updated - [x] Reproduction script provided (required for Core Runtime) - [ ] CI passes - [ ] If tests are impractical: explain why below and provide manual evidence above The bug is straightforward to reproduce with the script above. Unit test coverage for `_get_schedule()` with a `None` schedule value would be a good addition. ## Non-Goals This fix does not address the Step Functions path (which returns `schedule.schedule` directly and handles `None` safely already), does not implement an `--no-enable-schedule` flag (see PR #3102), and does not change any scheduling behaviour for flows where a schedule is actually configured. ## AI Tool Usage - [x] AI tools were used (describe below) OpenCode (Claude Sonnet) was used to identify the bug, draft the issue, and generate the fix. All generated code was reviewed and understood before committing. --------- Co-authored-by: Shashank Srikanth <ssrikanth@netflix.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
--enable-schedule/--no-enable-scheduleflag tostep-functions createandargo-workflows create--no-enable-scheduleis passed, the flow is deployed but its cron schedule is not activatedsuspend: trueon the CronWorkflowTest plan
test/unit/test_schedule_toggle.py— unit tests for schedule enable/disable behavior🤖 Generated with Claude Code