Skip to content

Improve GitHub Actions CI runtime#3039

Closed
talsperre wants to merge 3 commits intomasterfrom
ssrikanth/use-larger-runners
Closed

Improve GitHub Actions CI runtime#3039
talsperre wants to merge 3 commits intomasterfrom
ssrikanth/use-larger-runners

Conversation

@talsperre
Copy link
Copy Markdown
Collaborator

@talsperre talsperre commented Mar 20, 2026

Summary

Speed up CI by using larger runners and optimizing test parallelism.

Changes

  • 64-core runners: Switch all CI jobs from ubuntu-22.04 to ubuntu-latest-64-cores (64 vCPU, 256 GB RAM)
  • Test parallelism: Bump --num-parallel from 8 to 64 in test_runner to match available cores
  • R install consolidation: Merge two install.packages calls into one with Ncpus=64
  • Runner condition fix: Update R system deps if to use startsWith(matrix.os, 'ubuntu') for new runner name

Test plan

  • Verify all Python test matrices pass on 64-core runners
  • Verify R tests pass with consolidated installs
  • Compare CI run times vs previous runs

🤖 Generated with Claude Code

Speed up CI by using larger GitHub-hosted runners (64 vCPU, 256 GB RAM)
for Python and R test jobs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 20, 2026

Greptile Summary

This PR aims to speed up CI by switching to 64-core runners, increasing test parallelism, and adding caching — but two of the four advertised optimizations (pip caching and R library caching) are absent from the actual diff, leaving meaningful speedup on the table.

  • Switches all three jobs (pre-commit, Python, R) to ubuntu-latest-64-cores runners and correctly updates the ubuntu system-deps if condition to startsWith(matrix.os, 'ubuntu').
  • Bumps --num-parallel from 8 to 64 in both run_tests and run_runtime_card_tests; the latter carries an inline warning that resource contention causes flaky timeout failures, making 64 parallel processes a risk for that specific suite.
  • Merges the two separate Rscript install.packages calls into one and raises Ncpus from 8 to 64, which is a genuine improvement.
  • Missing: cache: pip is not set on either setup-python step, so pip dependencies are re-downloaded on every run despite the PR description claiming this was added.
  • Missing: No actions/cache step for ~/R/library exists, so R packages are recompiled from source on every run despite the PR description claiming this was added.

Confidence Score: 3/5

  • Safe to merge for correctness, but two advertised optimizations are unimplemented and the card-test parallelism remains a flakiness risk.
  • The CI config changes are mechanically correct (runner names, condition fix, package consolidation), but the PR description advertises pip caching and R library caching that simply aren't present in the diff. This means reviewers/contributors may expect speedups that won't materialise. Additionally, bumping run_runtime_card_tests to --num-parallel 64 conflicts with the code's own warning about timeout-sensitive tests being sensitive to resource contention.
  • .github/workflows/test.yml — missing cache: pip and R library cache steps; test_runnerrun_runtime_card_tests parallelism vs. its own timeout warning.

Important Files Changed

Filename Overview
.github/workflows/test.yml Switches all CI jobs to 64-core runners, fixes the ubuntu startsWith condition for system deps, merges R package installs with Ncpus=64, but the pip caching and R library caching advertised in the PR description are missing from the actual diff.
test_runner Bumps --num-parallel from 8 to 64 for both run_tests and run_runtime_card_tests; the latter carries an inline comment explicitly warning that resource contention causes flaky timeouts, making 64 parallel processes a risk for that specific suite.

Comments Outside Diff (1)

  1. .github/workflows/test.yml, line 44-49 (link)

    P2 Advertised pip caching not implemented

    The PR description states "Pip caching: Enable built-in pip cache in setup-python action", but the setup-python step does not have the cache: pip input set — neither here for the Python job nor at line 18 for the pre-commit job. As a result, pip dependencies are re-downloaded on every run, and the claimed speedup from caching is not realized.

Last reviewed commit: "Remove pip and R cac..."

jobs:
pre-commit:
runs-on: ubuntu-22.04
runs-on: ubuntu-latest-64-cores
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Over-provisioned runner for pre-commit

The pre-commit job runs linters and formatters (tools like black, flake8, isort, etc.), which are inherently single-threaded and typically complete in under a minute. Allocating a 64-core runner here is unlikely to provide any speedup and significantly increases per-minute cost (64-core GitHub-hosted runners cost ~32× more than the standard 2-core runner).

Consider keeping pre-commit on a smaller runner (e.g. ubuntu-latest or ubuntu-22.04) while reserving the larger runner for the heavier Python and R test jobs:

Suggested change
runs-on: ubuntu-latest-64-cores
runs-on: ubuntu-latest

- Bump --num-parallel from 8 to 64 in test_runner to match core count
- Add pip cache via setup-python built-in cache option
- Add R library cache to avoid recompiling packages across runs
- Consolidate R install.packages calls into one with Ncpus=64

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@talsperre talsperre changed the title Use 64-core runners for CI tests Improve GitHub Actions CI runtime Mar 20, 2026
Comment thread test_runner
Comment on lines 26 to +27
CARD_GRAPHS="small-foreach,small-parallel,nested-branches,single-linear-step,simple-foreach"
cd test/core && PYTHONPATH=`pwd`/../../ python3 run_tests.py --num-parallel 8 --contexts python3-all-local-cards-realtime --graphs $CARD_GRAPHS && cd ../../
cd test/core && PYTHONPATH=`pwd`/../../ python3 run_tests.py --num-parallel 64 --contexts python3-all-local-cards-realtime --graphs $CARD_GRAPHS && cd ../../
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Timing-sensitive card tests at max parallelism

The comment directly above this function warns that realtime card tests "will surely fail" if other processes are starving resources, since the tests rely on timeouts for asynchronous datastore validation. While CPU starvation is much less likely with 64 cores, running 64 parallel test processes can still introduce memory pressure, I/O contention, and contention on the local datastore (which handles the async card updates being checked). This may cause flaky failures — which is exactly what the comment warns about.

Consider using a more conservative value for this specific function (e.g. --num-parallel 16 or --num-parallel 32) to leave headroom for I/O and datastore operations:

Suggested change
CARD_GRAPHS="small-foreach,small-parallel,nested-branches,single-linear-step,simple-foreach"
cd test/core && PYTHONPATH=`pwd`/../../ python3 run_tests.py --num-parallel 8 --contexts python3-all-local-cards-realtime --graphs $CARD_GRAPHS && cd ../../
cd test/core && PYTHONPATH=`pwd`/../../ python3 run_tests.py --num-parallel 64 --contexts python3-all-local-cards-realtime --graphs $CARD_GRAPHS && cd ../../
cd test/core && PYTHONPATH=`pwd`/../../ python3 run_tests.py --num-parallel 16 --contexts python3-all-local-cards-realtime --graphs $CARD_GRAPHS && cd ../../

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment on lines 68 to 86
@@ -73,18 +73,17 @@ jobs:
r-version: ${{ matrix.ver }}

- name: Install R ${{ matrix.ver }} system dependencies
if: matrix.os == 'ubuntu-22.04'
if: startsWith(matrix.os, 'ubuntu')
run: sudo apt-get update; sudo apt-get install -y libcurl4-openssl-dev qpdf libgit2-dev libharfbuzz-dev libfribidi-dev libwebp-dev

- name: Install R ${{ matrix.ver }} Rlang dependencies
run: |
python3 -m venv path/to/venv
source path/to/venv/bin/activate
python3 -m pip install .
Rscript -e 'install.packages(c("devtools", "remotes"), repos="https://cloud.r-project.org", Ncpus=8)'
Rscript -e 'install.packages(c("devtools", "remotes", "data.table", "caret", "glmnet", "Matrix", "rjson"), repos="https://cloud.r-project.org", Ncpus=64)'
Rscript -e 'devtools::install_deps("R", dependencies=TRUE, repos="https://cloud.r-project.org", upgrade="default")'
R CMD INSTALL R
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Advertised R library caching not implemented

The PR description states "R library caching: Cache ~/R/library across runs to avoid recompiling packages", but no actions/cache step (or equivalent) for ~/R/library appears anywhere in the R job. Recompiling all R packages from source on every run is one of the primary bottlenecks in R CI, so the missing cache means the stated optimization was not applied.

Adding a step like the following before the dependency install would realize the advertised benefit:

- name: Cache R packages
  uses: actions/cache@v4
  with:
    path: ~/R/library
    key: ${{ runner.os }}-r-${{ matrix.ver }}-${{ hashFiles('R/DESCRIPTION') }}
    restore-keys: |
      ${{ runner.os }}-r-${{ matrix.ver }}-

@talsperre talsperre closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant