Commit eb4334b
committed
CI: Retry build upon failure
In Jan-Feb 2026: NuttX CI hit a [record high usage of GitHub Runners](#17914), exceeding the limit enforced by ASF Infrastructure Team. We analysed the PRs and discovered that most GitHub Runners were wasted on __(1) Failure to Download the Build Dependencies__ for DTC Device Tree, OpenAMP Messaging, MicroADB Debugger, MCUBoot Bootloader, NimBLE Bluetooth, etc __(2) Resubmitting PR Commits__:
- [Video: Analysing the Most Expensive PR](https://youtu.be/swFaxaTCEQg)
- [Video: Second Most Expensive PR](https://youtu.be/uSpQkzBogEw)
- [Video: Third Most Expensive PR](https://youtu.be/J7w1gyjwZ1w)
- [Video: Most Expensive Apps PR](https://youtu.be/182h8cRpfvI)
- [Spreadsheet: Most Expensive PRs](https://docs.google.com/spreadsheets/d/1HY7fIZzd_fs3QPyA0TX7vsYOjL86m1fNOf1Wls93luI/edit?gid=70515654#gid=70515654)
Why would __Download Failures__ waste GitHub Runners? That's because Download Failures will terminate the Entire CI Build (across All CI Jobs), requiring a restart of the CI Build. And the CI Build isn't terminated immediately upon failure: NuttX CI waits for the CI Job to complete (e.g. arm-01), before terminating the CI Build. Which means that CI Builds can get terminated 2.5 hours into the CI Build, wasting 2.5 elapsed hours x [7.4 parallel processes](https://lupyuen.org/articles/ci3#live-metric-for-full-time-runners) of GitHub Runners.
This PR proposes to __Retry the Build for Each CI Target__. NuttX CI shall rebuild each CI Target (e.g. `sim:nsh`), upon failure, up to 3 times (total 4 builds). Each rebuild will be attempted after a Randomised Delay with Exponential
Backoff, initially set to 60 seconds, then 120 seconds, 240 seconds. The rebuilds will mitigate the effects of Intermittent Download Failures that occur in GitHub Actions. (And eliminate developer frustration)
If the build fails after 3 retries: Subsequent CI Targets will __not be allowed to rebuild__ upon failure. This is to prevent cascading build failures from overloading GitHub Actions, and consuming too many GitHub Runners.
Note that NuttX CI shall retry the build for __Any Kind of Build Failure__, including Download Failures, Compile Errors and Config Errors. We designed it simplistically due to our current constraints: (1) Lack of CI Expertise (2) NuttX CI is Mission Critical (3) Legacy CI Scripts are Highly Complex. To prevent Compile Errors and Config Errors: We expect NuttX Devs to [Build and Test PRs in Our Own Repos](#18568), before submitting to NuttX.
What about __Resubmitting PR Commits__ and its wastage of GitHub Runners? We also require NuttX Devs to [Build and Test PRs in Our Own Repos](#18568), before resubmitting to NuttX. GitHub Runners will then be charged to the developer's quota, without affecting the GitHub Runners quota for Apache NuttX Project. We plan to [Kill All CI Jobs](https://youtu.be/182h8cRpfvI?si=MmAuwLISZPPMoqDq&t=1479) for PRs that have been switched to Draft Mode. We'll monitor this through the [NuttX Build Monitor](#18659).
Modified Files:
`tools/testbuild.sh`: We introduce a New Wrapper Function `retrytest` that will call the Existing Function `dotest`, to build the CI Target and retry on error.
`Documentation/components/tools/testbuild.rst`: Updated the `testbuild.sh` doc with the Retry Logic.
Signed-off-by: Lup Yuen Lee <luppy@appkaki.com>1 parent 3f16c4a commit eb4334b
2 files changed
Lines changed: 56 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | | - | |
| 26 | + | |
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| 27 | + | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| |||
580 | 581 | | |
581 | 582 | | |
582 | 583 | | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
583 | 627 | | |
584 | 628 | | |
585 | 629 | | |
| |||
588 | 632 | | |
589 | 633 | | |
590 | 634 | | |
591 | | - | |
| 635 | + | |
592 | 636 | | |
593 | 637 | | |
594 | | - | |
| 638 | + | |
595 | 639 | | |
596 | 640 | | |
597 | 641 | | |
| |||
0 commit comments