Description
The @timeout decorator uses signal.SIGALRM and signal.alarm() to enforce step
time limits. Both of these are Unix-only and do not exist on Windows. Any flow
that uses @timeout will raise an AttributeError on Windows at task execution time.
The same Unix-only pattern also exists in the card CLI:
metaflow/plugins/timeout_decorator.py lines 73–74:
signal.signal(signal.SIGALRM, self._sigalrm_handler)
signal.alarm(self.secs)
metaflow/plugins/cards/card_cli.py lines 164–166:
signal.signal(signal.SIGALRM, raise_timeout)
signal.alarm(time)
There is no platform guard anywhere in these files. The AttributeError is raised at
task execution time (inside task_pre_step), not at import or decoration time,
so Windows users receive no warning until the flow is actually running.
Steps to Reproduce
On a Windows machine:
# my_flow.py
from metaflow import FlowSpec, step, timeout
class MyFlow(FlowSpec):
@timeout(seconds=30)
@step
def start(self):
import time
time.sleep(5)
self.next(self.end)
@step
def end(self):
pass
if __name__ == "__main__":
MyFlow()
Runtime: local, Windows
Where evidence shows up: task logs / parent console
Before (error / log snippet)
AttributeError: module 'signal' has no attribute 'SIGALRM'
File "metaflow/plugins/timeout_decorator.py", line 73, in task_pre_step
signal.signal(signal.SIGALRM, self._sigalrm_handler)
The error appears mid-run when the step task starts executing, not at import time.
The flow has already initialized and begun before the crash — misleading for users.
After (expected behavior)
# Option A — timeout works via threading.Timer on Windows
Metaflow [1234/start/1 (pid 5678)] Task is starting.
Metaflow [1234/start/1 (pid 5678)] Step timed out after 30 seconds.
# Option B — clear error at decoration time, not mid-run
AttributeError: @timeout is not supported on Windows.
Please use a cloud compute backend (Kubernetes, Batch) where tasks run on Linux.
Current Behavior
task_pre_step in TimeoutDecorator unconditionally calls signal.SIGALRM and
signal.alarm():
# timeout_decorator.py:70–74
if ubf_context != UBF_CONTROL and retry_count <= max_user_code_retries:
self.step_name = step_name
signal.signal(signal.SIGALRM, self._sigalrm_handler) # AttributeError on Windows
signal.alarm(self.secs) # AttributeError on Windows
There is no sys.platform check, no hasattr(signal, 'SIGALRM') guard, and no
documentation noting that @timeout is Linux/macOS-only.
Expected Behavior
One of two paths (maintainer's choice):
Option A — Cross-platform implementation using threading.Timer
import sys
import threading
if sys.platform == "win32":
# SIGALRM is not available on Windows — use a daemon thread timer instead.
def _start_timeout(self):
self._timer = threading.Timer(self.secs, self._timeout_handler)
self._timer.daemon = True
self._timer.start()
def _cancel_timeout(self):
if hasattr(self, "_timer"):
self._timer.cancel()
else:
def _start_timeout(self):
signal.signal(signal.SIGALRM, self._sigalrm_handler)
signal.alarm(self.secs)
def _cancel_timeout(self):
signal.alarm(0)
threading.Timer fires _timeout_handler on a background thread, which raises
TimeoutException in the main thread using ctypes.pythonapi.PyThreadState_SetAsyncExc.
Option B — Fail fast with a clear error at step_init time
def step_init(self, flow, graph, step, decos, environment, flow_datastore, logger):
import sys
if sys.platform == "win32":
raise MetaflowException(
"@timeout is not supported on Windows because it relies on "
"POSIX signals (SIGALRM). Use a Linux-based compute backend "
"(e.g. @kubernetes or @batch) to use @timeout."
)
...
Option B is lower risk and gives a clear, early error message instead of a cryptic
AttributeError mid-run.
Root Cause
signal.SIGALRM is defined in POSIX (Unix/Linux/macOS) but is absent from
the Windows signal module. Python's standard library documents this clearly:
signal.SIGALRM — Not available on Windows.
signal.alarm(time) — Not available on Windows.
The decorator was written assuming a Unix task execution environment, which is
reasonable for cloud compute (Kubernetes pods, Batch containers all run Linux).
However, local execution runs directly on the user's OS, and Metaflow supports
local runs on Windows. The @timeout decorator is particularly likely to be used
in local development/testing, exactly where Windows users will hit this.
Impact
| Scenario |
Effect |
Windows user adds @timeout to any step |
AttributeError mid-run, flow fails |
Windows user tries local development with @timeout |
Broken — cannot test locally |
| No warning at decoration or import time |
Error appears late, confusing to diagnose |
card_cli.py also uses SIGALRM |
Card rendering also broken on Windows |
Affected Files
| File |
Lines |
Issue |
metaflow/plugins/timeout_decorator.py |
L73–74, L79 |
SIGALRM + alarm() with no platform guard |
metaflow/plugins/cards/card_cli.py |
L164–166, L173 |
Same pattern in card timeout context manager |
Out of Scope
- Fixing
@timeout behavior on cloud backends (Kubernetes, Batch) — those always
run Linux containers, so SIGALRM works correctly there.
- Changing the timeout behavior for macOS/Linux — no change needed.
Environment
- OS: Windows 10 / Windows 11
- Metaflow version: current
master
- Python: 3.8+
Description
The
@timeoutdecorator usessignal.SIGALRMandsignal.alarm()to enforce steptime limits. Both of these are Unix-only and do not exist on Windows. Any flow
that uses
@timeoutwill raise anAttributeErroron Windows at task execution time.The same Unix-only pattern also exists in the card CLI:
metaflow/plugins/timeout_decorator.pylines 73–74:metaflow/plugins/cards/card_cli.pylines 164–166:There is no platform guard anywhere in these files. The
AttributeErroris raised attask execution time (inside
task_pre_step), not at import or decoration time,so Windows users receive no warning until the flow is actually running.
Steps to Reproduce
On a Windows machine:
Runtime: local, Windows
Where evidence shows up: task logs / parent console
Before (error / log snippet)
The error appears mid-run when the step task starts executing, not at import time.
The flow has already initialized and begun before the crash — misleading for users.
After (expected behavior)
Current Behavior
task_pre_stepinTimeoutDecoratorunconditionally callssignal.SIGALRMandsignal.alarm():There is no
sys.platformcheck, nohasattr(signal, 'SIGALRM')guard, and nodocumentation noting that
@timeoutis Linux/macOS-only.Expected Behavior
One of two paths (maintainer's choice):
Option A — Cross-platform implementation using
threading.Timerthreading.Timerfires_timeout_handleron a background thread, which raisesTimeoutExceptionin the main thread usingctypes.pythonapi.PyThreadState_SetAsyncExc.Option B — Fail fast with a clear error at
step_inittimeOption B is lower risk and gives a clear, early error message instead of a cryptic
AttributeErrormid-run.Root Cause
signal.SIGALRMis defined in POSIX (Unix/Linux/macOS) but is absent fromthe Windows
signalmodule. Python's standard library documents this clearly:The decorator was written assuming a Unix task execution environment, which is
reasonable for cloud compute (Kubernetes pods, Batch containers all run Linux).
However, local execution runs directly on the user's OS, and Metaflow supports
local runs on Windows. The
@timeoutdecorator is particularly likely to be usedin local development/testing, exactly where Windows users will hit this.
Impact
@timeoutto any stepAttributeErrormid-run, flow fails@timeoutcard_cli.pyalso usesSIGALRMAffected Files
metaflow/plugins/timeout_decorator.pySIGALRM+alarm()with no platform guardmetaflow/plugins/cards/card_cli.pyOut of Scope
@timeoutbehavior on cloud backends (Kubernetes, Batch) — those alwaysrun Linux containers, so SIGALRM works correctly there.
Environment
master