Skip to content

[iris] Increase GCP smoke test shell poll timeout#4091

Merged
rjpower merged 1 commit intomainfrom
fix/smoke-gcp-shell-timeout
Mar 24, 2026
Merged

[iris] Increase GCP smoke test shell poll timeout#4091
rjpower merged 1 commit intomainfrom
fix/smoke-gcp-shell-timeout

Conversation

@rjpower
Copy link
Copy Markdown
Collaborator

@rjpower rjpower commented Mar 24, 2026

The shell loop waiting for the controller URL file had a 720s budget
(seq 1 360 * 2s) shared between image build/push and worker startup.
When Docker push was slow in CI (~10min), workers had <90s to register
before the shell killed the process. Observed in run 23498489452 where
the tunnel came up at 15:56:54 but the shell timed out at 15:58:22.

Increase to seq 1 900 (1800s) so the Python-side --worker-timeout 600
becomes the effective deadline for worker readiness.

Reproduced locally: cluster came up in 376s with the fix.

The shell loop that waits for the controller URL file shared its 720s
budget with image build/push. When Docker push was slow (~10min in CI),
workers had <90s to register before the shell timeout killed the process.

Increase from seq 1 360 (720s) to seq 1 900 (1800s) so the Python-side
worker-timeout (600s) is the effective deadline.
@rjpower rjpower added the agent-generated Created by automation/agent label Mar 24, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 24, 2026

Claude finished @rjpower's task in 1m 1s —— View job


  • Read the changed file and surrounding context
  • Verify the change matches the PR description
  • Post review

Review

Increases the shell poll loop from seq 1 360 (720s) to seq 1 900 (1800s), so the shell timeout no longer races against --worker-timeout 600. The change is correct and matches the PR description.

No issues found. The Python-side --worker-timeout 600 (line 130) remains the effective deadline for worker readiness, and the shell loop now provides sufficient headroom around it.

@rjpower rjpower merged commit 53a07ad into main Mar 24, 2026
40 checks passed
@rjpower rjpower deleted the fix/smoke-gcp-shell-timeout branch March 24, 2026 18:01
Helw150 pushed a commit that referenced this pull request Apr 8, 2026
The shell loop waiting for the controller URL file had a 720s budget
(seq 1 360 * 2s) shared between image build/push and worker startup.
When Docker push was slow in CI (~10min), workers had <90s to register
before the shell killed the process. Observed in run 23498489452 where
the tunnel came up at 15:56:54 but the shell timed out at 15:58:22.

Increase to seq 1 900 (1800s) so the Python-side --worker-timeout 600
becomes the effective deadline for worker readiness.

Reproduced locally: cluster came up in 376s with the fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant