Skip to content

feat: add --enable-schedule/--no-enable-schedule flag to disable schedules during deployment#3102

Open
npow wants to merge 1 commit intomasterfrom
npow/schedule-toggle-clean
Open

feat: add --enable-schedule/--no-enable-schedule flag to disable schedules during deployment#3102
npow wants to merge 1 commit intomasterfrom
npow/schedule-toggle-clean

Conversation

@npow
Copy link
Copy Markdown
Collaborator

@npow npow commented Apr 13, 2026

Summary

  • Add --enable-schedule/--no-enable-schedule flag to step-functions create and argo-workflows create
  • When --no-enable-schedule is passed, the flow is deployed but its cron schedule is not activated
  • Useful for deploying flows in a disabled state for testing or staged rollouts
  • For SFN: skips EventBridge rule creation when disabled
  • For Argo: sets suspend: true on the CronWorkflow

Test plan

  • test/unit/test_schedule_toggle.py — unit tests for schedule enable/disable behavior

🤖 Generated with Claude Code

…dules during deployment

When deploying dev/test branches of production flows that use @schedule,
the schedule gets deployed too, potentially interfering with production.

This adds a --no-enable-schedule flag to both `argo-workflows create` and
`step-functions create` that deploys the flow with the schedule suspended
(Argo) or skipped (SFN). Also supports METAFLOW_SCHEDULE_DISABLED=1 env
var for CI/CD use. CLI flag takes precedence over the env var.

Closes #1169
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 13, 2026

Greptile Summary

This PR adds --enable-schedule/--no-enable-schedule flags to both argo-workflows create and step-functions create, backed by a METAFLOW_SCHEDULE_DISABLED config fallback. The Argo side works correctly — disabling passes None to schedule_workflow_template, which sets suspend: true on the CronWorkflow. The SFN side has a critical crash and two meaningful gaps:

  • P0 — AttributeError crash (SFN): StepFunctions.schedule() returns early when disabled without ever assigning self.event_bridge_rule. trigger_explanation() is then called unconditionally and tries to format self.event_bridge_rule (which doesn't exist), crashing for any flow with a @schedule decorator. The unit tests isolate schedule() and trigger_explanation() separately and therefore miss this failure path.
  • P1 — Existing EventBridge rule is not deactivated (SFN): Re-deploying with --no-enable-schedule skips rule creation but leaves any previously-created EventBridge rule active, so the cron continues to fire. Argo handles this correctly by always calling schedule_workflow_template.
  • P1 — Contradictory output (both CLIs): When schedule_disabled=True, trigger_explanation() is still printed unconditionally and says "triggers automatically", directly contradicting the "disabled" notice printed just above it.

Confidence Score: 2/5

Not safe to merge — the SFN path crashes at runtime for the primary use-case (a scheduled flow deployed with --no-enable-schedule).

A P0 AttributeError crash exists on the SFN code path for the exact scenario this feature targets. Two additional P1 issues (existing EventBridge rules not deactivated on re-deployment, contradictory CLI output) further reduce confidence.

step_functions.py and step_functions_cli.py require fixes before merging; argo_workflows_cli.py needs the contradictory output resolved.

Important Files Changed

Filename Overview
metaflow/plugins/aws/step_functions/step_functions.py Adds schedule_disabled guard to schedule(), but event_bridge_rule is never initialised in __init__, causing AttributeError in trigger_explanation() for any scheduled flow deployed with --no-enable-schedule. Also leaves existing EventBridge rules active on re-deployment.
metaflow/plugins/aws/step_functions/step_functions_cli.py Adds --enable-schedule/--no-enable-schedule flag and wires up _is_schedule_disabled; trigger_explanation() is unconditionally called after schedule() returns early, crashing on self.event_bridge_rule (AttributeError) for flows with a @schedule decorator.
metaflow/plugins/argo/argo_workflows_cli.py Adds --enable-schedule/--no-enable-schedule flag for Argo; schedule disabling works correctly (CronWorkflow is suspended), but the disabled notification and trigger_explanation() are both printed unconditionally, producing contradictory output.
metaflow/plugins/argo/argo_workflows.py Passes None schedule/timezone to schedule_workflow_template when disabled, correctly setting suspend: true on the CronWorkflow — functionally sound.
metaflow/metaflow_config.py Adds SCHEDULE_DISABLED config entry; placed at end of SFN block without a trailing blank line before the Kubernetes section.
metaflow/plugins/argo/argo_workflows_deployer.py Adds enable_schedule parameter to docstring only; no functional changes.
metaflow/plugins/aws/step_functions/step_functions_deployer.py Adds enable_schedule parameter to docstring only; no functional changes.
test/unit/test_schedule_toggle.py Unit tests cover the _is_schedule_disabled helper and isolated schedule() calls, but miss the crash path: no test exercises schedule(schedule_disabled=True) followed by trigger_explanation() on an SFN flow with a cron schedule.

Reviews (1): Last reviewed commit: "feat: add --enable-schedule/--no-enable-..." | Re-trigger Greptile

Comment on lines +265 to 269
# Resolve schedule_disabled: CLI flag takes precedence, then env var
schedule_disabled = _is_schedule_disabled(enable_schedule)

flow.schedule(schedule_disabled=schedule_disabled)
obj.echo("What will trigger execution of the workflow:", bold=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 AttributeError crash when --no-enable-schedule is used with a scheduled flow

When schedule_disabled=True and the flow has a @schedule decorator, flow.schedule() returns early (line 147 of step_functions.py) without assigning self.event_bridge_rule. The very next call to flow.trigger_explanation() at line 269 checks if self._cron: (which is True) and then unconditionally accesses self.event_bridge_rule — raising AttributeError: 'StepFunctions' object has no attribute 'event_bridge_rule'. The primary use-case of this feature crashes at every invocation for SFN flows with a schedule.

The minimal fix is to initialise self.event_bridge_rule = None in StepFunctions.__init__ and guard trigger_explanation() against None, or suppress the schedule-based explanation when schedule_disabled is True.

Suggested change
# Resolve schedule_disabled: CLI flag takes precedence, then env var
schedule_disabled = _is_schedule_disabled(enable_schedule)
flow.schedule(schedule_disabled=schedule_disabled)
obj.echo("What will trigger execution of the workflow:", bold=True)
flow.schedule(schedule_disabled=schedule_disabled)
obj.echo("What will trigger execution of the workflow:", bold=True)
if schedule_disabled:
obj.echo(
"The schedule for this workflow has been *disabled* "
"(EventBridge rule will not be created).",
indent=True,
)
else:
obj.echo(flow.trigger_explanation(), indent=True)

Comment on lines +143 to +148
def schedule(self, schedule_disabled=False):
# Scheduling is currently enabled via AWS Event Bridge.
# If no cron schedule is defined, or schedule is explicitly disabled,
# nothing to do.
if not self._cron or schedule_disabled:
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Existing EventBridge rule is not disabled on re-deployment

When schedule_disabled=True, the method returns immediately without touching EventBridge. If the flow was previously deployed with an active EventBridge rule, that rule is left intact and the cron continues to fire. Argo's schedule() sidesteps this by always calling schedule_workflow_template (passing None for schedule/timezone), which updates the CronWorkflow to suspend: true. SFN has no equivalent cleanup, so re-deploying with --no-enable-schedule gives users a false sense of security — the schedule is not actually deactivated.

Consider calling EventBridgeClient(self.name).delete() (or an equivalent disable operation) when schedule_disabled=True and an existing rule is present, to achieve parity with Argo's behaviour.

Comment on lines 453 to 461
obj.echo("What will trigger execution of the workflow:", bold=True)
if schedule_disabled:
obj.echo(
"The schedule for this workflow has been *disabled* "
"(deployed in suspended state).",
indent=True,
)
obj.echo(flow.trigger_explanation(), indent=True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Contradictory output when schedule is disabled

When schedule_disabled=True and the flow has a @schedule decorator, both messages are printed:

  1. "The schedule for this workflow has been *disabled* (deployed in suspended state)."
  2. trigger_explanation()"This workflow triggers automatically via the CronWorkflow *<name>*."

These directly contradict each other. The same issue exists in the SFN CLI (step_functions_cli.py lines 265-269). The call to trigger_explanation() should be guarded so it is not printed when the schedule is disabled.

Suggested change
obj.echo("What will trigger execution of the workflow:", bold=True)
if schedule_disabled:
obj.echo(
"The schedule for this workflow has been *disabled* "
"(deployed in suspended state).",
indent=True,
)
obj.echo(flow.trigger_explanation(), indent=True)
flow.schedule(schedule_disabled=schedule_disabled)
obj.echo("What will trigger execution of the workflow:", bold=True)
if schedule_disabled:
obj.echo(
"The schedule for this workflow has been *disabled* "
"(deployed in suspended state).",
indent=True,
)
else:
obj.echo(flow.trigger_explanation(), indent=True)

Comment on lines 69 to +79
headline = "Unsupported version of Python"


def _is_schedule_disabled(enable_schedule):
"""Resolve whether the schedule should be disabled.

CLI flag (--enable-schedule / --no-enable-schedule) takes precedence.
If not specified, falls back to the METAFLOW_SCHEDULE_DISABLED env var / config.
"""
if enable_schedule is not None:
return not enable_schedule
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Duplicated _is_schedule_disabled helper

_is_schedule_disabled is defined identically in argo_workflows_cli.py and step_functions_cli.py. Consider extracting it to a shared location (e.g. a small utility in metaflow/metaflow_config.py or a common CLI helpers module) and importing it from both CLI files to keep the logic in one place.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +396 to 403

###
# Schedule configuration
###
# When set to True, deploys the workflow with the schedule disabled (suspended).
# Useful for deploying dev/test branches without activating production schedules.
SCHEDULE_DISABLED = from_conf("SCHEDULE_DISABLED", False)
###
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing blank line and misplaced section

The new SCHEDULE_DISABLED config block is placed immediately after the SFN-specific settings with no blank line before the ### Kubernetes configuration ### section. Since this setting applies to both SFN and Argo, it would read more clearly if it were placed in a more neutral location and had the standard blank separator line.

Suggested change
###
# Schedule configuration
###
# When set to True, deploys the workflow with the schedule disabled (suspended).
# Useful for deploying dev/test branches without activating production schedules.
SCHEDULE_DISABLED = from_conf("SCHEDULE_DISABLED", False)
###
SCHEDULE_DISABLED = from_conf("SCHEDULE_DISABLED", False)
###
# Kubernetes configuration
###

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

talsperre added a commit that referenced this pull request Apr 26, 2026
#3122)

## PR Type

- [x] Bug fix

## Summary

When `@schedule` is present but all scheduling flags are falsy (e.g.
`@schedule(daily=False)`), deploying to Argo Workflows crashes with
`AttributeError: 'NoneType' object has no attribute 'split'` and no
WorkflowTemplate is created. This fix makes the deploy succeed and skip
CronWorkflow creation, which is the expected behaviour when no schedule
is configured.

## Issue

Fixes #3121

## Reproduction

**Runtime:** argo

**Commands to run:**
```bash
python myflow.py argo-workflows create
```

Where `myflow.py` contains:

```python
from metaflow import FlowSpec, step, schedule

@schedule(daily=False)
class MyFlow(FlowSpec):
    @step
    def start(self):
        self.next(self.end)

    @step
    def end(self):
        pass

if __name__ == "__main__":
    MyFlow()
```

**Where evidence shows up:** parent console / `argo-workflows create`
output

<details>
<summary>Before (error / log snippet)</summary>

```
AttributeError: 'NoneType' object has no attribute 'split'
  File ".../metaflow/plugins/argo/argo_workflows.py", line 463, in _get_schedule
    return " ".join(schedule.schedule.split()[:5]), schedule.timezone
```

</details>

<details>
<summary>After (evidence that fix works)</summary>

```
Workflow template 'MyFlow' created successfully.
No CronWorkflow created (no schedule configured).
```

</details>

## Root Cause

`ScheduleDecorator.flow_init` sets `self.schedule = None` when all
scheduling flags are falsy (`schedule_decorator.py:49`).
`_get_schedule()` in `argo_workflows.py` guarded against the decorator
being absent (`if schedule:`), but not against `schedule.schedule` being
`None`. Calling `.split()` on `None` raises `AttributeError`.
Separately, `trigger_explanation()` checked decorator presence rather
than `self._schedule`, so it would incorrectly report CronWorkflow-based
triggering even when no schedule was resolved.

## Why This Fix Is Correct

`_get_schedule()` now returns `(None, None)` when `schedule.schedule is
None`, treating it identically to the decorator being absent. The rest
of the deploy path already handles `self._schedule is None` correctly —
no CronWorkflow is created. `trigger_explanation()` now checks
`self._schedule is not None`, which is the actual resolved value, rather
than decorator presence.

## Failure Modes Considered

1. **Flows with a real schedule are unaffected** — the
`schedule.schedule is None` guard is only entered when all flags are
falsy; any valid cron string, `daily=True`, `hourly=True`, or
`weekly=True` sets a non-None value and takes the existing return path.
2. **Backward compatibility** — the previous behaviour for the
`@schedule(daily=False)` case was a crash, so changing it to a clean
no-op is strictly an improvement with no risk of regression.

## Tests

- [x] Unit tests added/updated
- [x] Reproduction script provided (required for Core Runtime)
- [ ] CI passes
- [ ] If tests are impractical: explain why below and provide manual
evidence above

The bug is straightforward to reproduce with the script above. Unit test
coverage for `_get_schedule()` with a `None` schedule value would be a
good addition.

## Non-Goals

This fix does not address the Step Functions path (which returns
`schedule.schedule` directly and handles `None` safely already), does
not implement an `--no-enable-schedule` flag (see PR #3102), and does
not change any scheduling behaviour for flows where a schedule is
actually configured.

## AI Tool Usage

- [x] AI tools were used (describe below)

OpenCode (Claude Sonnet) was used to identify the bug, draft the issue,
and generate the fix. All generated code was reviewed and understood
before committing.

---------

Co-authored-by: Shashank Srikanth <ssrikanth@netflix.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant