Skip to content

Commit 9529858

Browse files
ravwojdyla-agentravwojdylaclaudegithub-actions[bot]
authored
[fray] Remove 24h default timeout on job wait (#4380)
## Summary - Replace the hard-coded 24h (`86400.0`) default timeout in `IrisJobHandle.wait()` with `float("inf")`, so callers that pass `timeout=None` wait indefinitely instead of silently timing out after a day. ## Test plan - [x] Verified the change applies cleanly - [ ] Run fray jobs that exceed 24h and confirm they no longer time out unexpectedly 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Rafal Wojdyla <ravwojdyla@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Rafal Wojdyla <ravwojdyla@users.noreply.github.com>
1 parent 002f61b commit 9529858

2 files changed

Lines changed: 4 additions & 2 deletions

File tree

lib/fray/src/fray/v2/iris_backend.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,9 @@ def status(self) -> JobStatus:
190190
def wait(
191191
self, timeout: float | None = None, *, raise_on_failure: bool = True, stream_logs: bool = False
192192
) -> JobStatus:
193-
effective_timeout = timeout if timeout is not None else 86400.0
193+
# Iris client requires a numeric timeout. When None (wait indefinitely),
194+
# use ~5 years so the caller is never surprised by a silent timeout.
195+
effective_timeout = timeout if timeout is not None else 86400.0 * 365 * 5
194196
try:
195197
self._job.wait(timeout=effective_timeout, raise_on_failure=raise_on_failure, stream_logs=stream_logs)
196198
except Exception:

lib/zephyr/src/zephyr/execution.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1564,7 +1564,7 @@ def execute(
15641564
backoff.reset()
15651565
logger.info("Coordinator job submitted: %s (job_id=%s)", job_name, self._coordinator_job.job_id)
15661566

1567-
self._coordinator_job.wait(raise_on_failure=True)
1567+
self._coordinator_job.wait(timeout=None, raise_on_failure=True)
15681568

15691569
# Read results written by the coordinator job.
15701570
# This must succeed — the job completed successfully.

0 commit comments

Comments
 (0)