fix(ci): increase apt retry timeout to prevent kill EPERM crash#29715
Merged
Conversation
Contributor
|
CLA Signature Action: All authors have signed the CLA. You may need to manually re-run the blocking PR check if it doesn't pass in a few minutes. |
andrepimenta
approved these changes
May 5, 2026
Restore timeout_minutes from 3 to 5 for the apt-get retry step, and add retry_on: error to only retry on command failure, not timeout. The 3-minute timeout was too tight: DPkg::Lock::Timeout=120 means apt-get can legitimately take up to 120s waiting for the dpkg lock on each of the two sudo calls (update + install). With slow mirrors on top, total time can approach or exceed 180s, triggering the nick-fields/retry timeout. When the timeout fires, the action tries to process.kill() the sudo child process, but since sudo runs as root and the runner runs as admin, Node.js gets EPERM — a known upstream bug (nick-fields/retry#124). The action crashes instead of retrying. With timeout_minutes: 5 (300s), even worst-case lock wait (240s) + slow install (30s) = 270s fits with 30s headroom. apt-get resolves on its own before the timeout fires, so the kill path is never hit. Adding retry_on: error ensures retries only happen on actual apt failures (mirror desync, lock timeout), not on the retry-action's own timeout — which would crash with EPERM anyway. Co-authored-by: Cursor <cursoragent@cursor.com>
7274995 to
8bcc995
Compare
|
Qbandev
approved these changes
May 7, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Description
Problem
Android E2E runs are crashing with
Error: kill EPERMin the "Set up E2E environment" step (example). This was introduced by #29236, which wrappedsudo apt-getinsidenick-fields/retrywithtimeout_minutes: 3.Root cause: When the 3-minute timeout fires,
nick-fields/retrycallsprocess.kill()on the child process. Butsudo apt-getruns as root while the Cirrus runner process runs as admin — Node.js getsEPERM(permission denied) on the kill syscall. This is a known upstream bug (open since Oct 2023, 11 upvotes, unpatched).Why the timeout fires:
DPkg::Lock::Timeout=120meansapt-getcan legitimately wait up to 120s for the dpkg lock on each of the twosudocalls (update+install). With slow Ubuntu mirrors on top, total time can approach or exceed 180s (3 min), triggering the timeout. The 3-minute value was tightened from the original 5-minute design in a follow-up commit on #29236, which didn't account for the double lock-wait scenario.Fix
Restore
timeout_minutesfrom 3 to 5 — gives 300s per attempt. Even worst-case (120s lock on update + 120s lock on install + 30s actual install = 270s) fits with 30s headroom.apt-getresolves on its own (success or dpkg lock timeout error) before the retry timeout fires, so theprocess.kill()path — and the EPERM bug — is never hit.Add
retry_on: error— only retry whenapt-getexits with a non-zero code (mirror desync, lock timeout), not whennick-fields/retry's own timeout fires. A timeout-triggered retry would crash with EPERM anyway, so this avoids a wasted attempt.Timing analysis
Changelog
CHANGELOG entry: null
Related issues
Refs: INFRA-3580
Fixes regression from #29236
Manual testing steps
N/A — CI infrastructure fix. Validated by any Android E2E workflow run. The timeout increase is transparent in the happy path (apt-get takes 5-15s).
Screenshots/Recordings
Before
N/A
After
N/A
Pre-merge author checklist
Pre-merge reviewer checklist
Made with Cursor
Note
Low Risk
Low risk CI-only change that adjusts retry behavior for Linux
apt-getduring Android E2E setup; primary impact is longer waits before failing and fewer timeout-triggered retries.Overview
Reduces flaky Android E2E setup failures by updating the
setup-e2e-envcomposite action to increase thenick-fields/retryapt-getwrapper timeout from 3 to 5 minutes.The retry wrapper is also configured with
retry_on: errorso retries only happen on non-zero exits, avoiding retries triggered by the action's own timeout.Reviewed by Cursor Bugbot for commit 8bcc995. Bugbot is set up for automated code reviews on this repo. Configure here.