ci: speed up test workflow with parallel jobs, sccache, and mold#831
ci: speed up test workflow with parallel jobs, sccache, and mold#831Evrard-Nil wants to merge 5 commits into
Conversation
Split the single 'test' job into five parallel jobs (lint, unit-test, integration-test, e2e-test, build-release) so they run concurrently on the self-hosted runner instead of sequentially. Unit and integration tests do not need PostgreSQL and no longer pull in the service container. Tooling improvements (shared via a new composite action): - sccache (RUSTC_WRAPPER=sccache): caches compiled objects across runs, complementing swatinem/rust-cache. Persistent on the self-hosted runner. - mold linker: 2-5x faster linking of the large test binaries. Installed via release binary (not apt) to avoid dpkg lock contention between parallel jobs. Configured via env var so the Docker reproducible build (which copies .cargo/config.toml) is unaffected and keeps using ld. Cleanup: - Remove the 'ALTER SYSTEM SET max_connections=150 + docker restart' step: .config/nextest.toml already caps the e2e-db group at 16 threads (16 * 4 pool conns = 64 < PG default 100), so the override is dead code that only added ~15-30s and a failure point. - Move 'cargo build --release' (main-push only) into its own job with no PostgreSQL service or test env vars (there is no build.rs). - Switch vLLM integration tests from 'cargo test' to 'cargo nextest' for consistent tooling. MockProvider is used by default (USE_REAL_VLLM is not set), so these run without external dependencies.
There was a problem hiding this comment.
Code Review
This pull request introduces a new composite GitHub Action, setup-rust-ci, to centralize Rust CI setup steps including the Rust toolchain, sccache, mold linker, cargo-nextest, and caching. Feedback on the PR identifies a critical issue in the mold installation step where using --strip-components=2 with -C /usr/local extracts the binary to /usr/local/mold instead of /usr/local/bin/mold, which will cause subsequent commands to fail as /usr/local is typically not in the system PATH. The reviewer provided actionable suggestions to fix the extraction path or use a local directory with $GITHUB_PATH.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| TARBALL="mold-${MOLD_VERSION}-${TARBALL_ARCH}-linux.tar.gz" | ||
| URL="https://github.com/rui314/mold/releases/download/v${MOLD_VERSION}/${TARBALL}" | ||
| curl -fsSL "$URL" -o "/tmp/${TARBALL}" | ||
| sudo tar -xzf "/tmp/${TARBALL}" -C /usr/local --strip-components=2 "mold-${MOLD_VERSION}-${TARBALL_ARCH}-linux/bin/mold" |
There was a problem hiding this comment.
Using --strip-components=2 with -C /usr/local will strip both mold-${MOLD_VERSION}-${TARBALL_ARCH}-linux and bin, extracting the mold binary directly into /usr/local/mold. Since /usr/local is typically not in the system PATH (unlike /usr/local/bin), the subsequent mold --version check and any cargo builds using mold as a linker will fail with a command/linker not found error.
To fix this, you can change --strip-components to 1 so that bin/mold is extracted into /usr/local, resulting in /usr/local/bin/mold.
Alternatively, to avoid requiring sudo and to prevent race conditions when multiple parallel jobs run on the same self-hosted runner concurrently, you could extract it to a local directory and append it to $GITHUB_PATH:
BIN_DIR="$HOME/.local/bin"
mkdir -p "$BIN_DIR"
tar -xzf "/tmp/${TARBALL}" -C "$BIN_DIR" --strip-components=2 "mold-${MOLD_VERSION}-${TARBALL_ARCH}-linux/bin/mold"
echo "$BIN_DIR" >> $GITHUB_PATH sudo tar -xzf "/tmp/${TARBALL}" -C /usr/local --strip-components=1 "mold-${MOLD_VERSION}-${TARBALL_ARCH}-linux/bin/mold"
Review: CI parallelization + sccache + moldNicely scoped and well-documented PR. I verified the central claim behind removing the A few issues worth addressing before merge:
|
The previous --strip-components=2 with -C /usr/local extracted mold to /usr/local/mold instead of /usr/local/bin/mold, so 'command -v mold' failed and cargo could not find the linker.
Setting CARGO_TARGET_*_LINKER=mold made rustc invoke mold directly as a
linker driver, but mold is a linker, not a driver, and rejects gcc driver
flags like -m64 ('unknown -m argument: 64').
Switch to RUSTFLAGS='-C debuginfo=0 -C link-arg=-fuse-ld=mold' so the
default cc driver invokes mold via -fuse-ld. This replaces the
target.cfg.rustflags from .cargo/config.toml (cargo treats RUSTFLAGS and
target.cfg.rustflags as mutually exclusive), which is fine for CI: the
dropped flags are reproducibility-only. debuginfo=0 is kept for faster
test compiles. The Docker reproducible build doesn't set RUSTFLAGS, so it
keeps the full config.toml rustflags with ld.
gcc's -fuse-ld=mold looks for 'ld.mold' on PATH, not 'mold'. Without it,
collect2 falls back to searching for 'ld' and fails ('cannot find ld')
because rustc's -B path only contains an lld symlink named 'ld'. Installing
both mold and ld.mold from the release tarball resolves this.
|
/ocr |
|
✅ OpenCodeReview: No comments generated. Looks good to me. |
PierreLeGuen
left a comment
There was a problem hiding this comment.
CI-only change that splits the test workflow into parallel jobs and adds sccache + mold via a setup-rust-ci composite action. All five jobs (lint, unit, integration, e2e, build-release) pass on the head; prior review feedback (mold extraction path) was addressed in c9f94d7. Looks good to merge.
Optional, non-blocking follow-ups:
.github/actions/setup-rust-ci/action.yml: on a cold runner, all parallel jobs hit thecommand -v moldguard simultaneously and download/extract to the same/tmppath and/usr/local/bin, which could intermittently corrupt the tarball or race the extraction. No production impact, but may cause spurious CI failures on first run after a runner reimage..github/workflows/test.yml(integration-test job):VLLM_BASE_URL/VLLM_API_KEYare passed withoutenvironment: Cloud API test env, so if those secrets are environment-scoped they resolve to empty. Harmless today sinceintegration_tests.rs:21defaults toMockProviderunlessUSE_REAL_VLLMis set — only matters if you later run real-vLLM tests here.
Checks: YAML parse of workflow + action files; gh pr checks 831 (all green); verified test targets e2e_all (crates/api/Cargo.toml:82) and integration_tests exist; confirmed -C debuginfo=0 already in .cargo/config.toml; confirmed nextest.toml e2e-db cap of 16 threads and that its filter excludes integration tests.
Summary
Splits the single sequential
testjob into five parallel jobs and adds sccache + the mold linker to cut CI wall-clock time. Targets the cloud-api test workflow, which already runs on our self-hosted runner (gpu11) but was slow because everything ran sequentially in one job.Does it already run on our runners?
Yes — both the old
lintandtestjobs already usedruns-on: [self-hosted, infra]. This PR keeps that. The slowness comes from (a) sequential execution, (b) no object-level compile cache, (c) slow GNUldlinking of the large test binaries, and (d) a deadALTER SYSTEM + docker restartstep.Results
Baseline (old workflow on main, sequential): ~10-11m. This is ~40% faster on a cold cache; sccache will improve further on warm runs.
Changes
Job split (parallel execution)
Old:
lint‖test(wheretestran unit → e2e → vLLM → release sequentially)New:
lint‖unit-test‖integration-test‖e2e-test‖build-release(all parallel)unit-test(cargo nextest run --lib --bins): no PostgreSQL needed — in-crate#[test]s use mocks.integration-test(cargo nextest run --test integration_tests): no PostgreSQL needed —MockProvideris used by default (USE_REAL_VLLMis not set).e2e-test(cargo nextest run --test e2e_all): the only job with the PostgreSQL service container.build-release(cargo build --release): main-push only, split out so it doesn't block tests and doesn't pull in PostgreSQL (there is nobuild.rs).sccache (
RUSTC_WRAPPER=sccache)Caches compiled objects at the object-file granularity, complementing
swatinem/rust-cache(which cachestarget/). SurvivesCargo.lockbumps and feature-flag changes better. On a self-hosted runner the default cache dir (~/.cache/sccache) persists on disk across runs, so noactions/cachestep is needed.mold linker (
RUSTFLAGS="-C link-arg=-fuse-ld=mold")2-5× faster linking of the ~60MB test binary. Installed via GitHub release binary (both
moldandld.mold) — not apt — to avoid dpkg lock contention between parallel jobs. Configured viaRUSTFLAGSenv var rather than in.cargo/config.tomlso:.cargo/config.tomland does NOT setRUSTFLAGS) is unaffected and keeps usingld.RUSTFLAGSenv replacestarget.cfg.rustflagsfrom config (cargo treats them as mutually exclusive), which is fine for CI — the dropped flags (--remap-path-prefix,--build-id=none,--hash-style=gnu,--no-undefined) are reproducibility-only and irrelevant for tests.-C debuginfo=0is kept for faster test compiles.Mold integration gotcha:
gcc -fuse-ld=moldlooks forld.moldon PATH (notmold). Rustc also passes its own-fuse-ld=lld+ a-Bpath to its bundled lld; the last-fuse-ldflag wins, so-fuse-ld=moldtakes effect. Installingld.moldfrom the mold tarball resolves thecollect2: cannot find 'ld'error.Cleanup
ALTER SYSTEM SET max_connections=150 + docker restartstep (was test.yml:74-87)..config/nextest.tomlalready caps thee2e-dbtest group atmax-threads = 16(16 × 4 pool conns = 64 < PG defaultmax_connections = 100), so the override was dead code adding ~15-30s and a failure point.cargo test --test integration_tests -- --nocapturetocargo nextest run --test integration_testsfor consistent tooling.build.rs→cargo build --releaseneeds none ofDATABASE_*/MODEL_DISCOVERY_*/AUTH_*).Composite action (
.github/actions/setup-rust-ci)New composite action centralizes: Rust toolchain, cargo-nextest, sccache, mold, and
swatinem/rust-cache(with per-jobcache-keyfor isolation). Avoids step duplication across 5 jobs.Verification
runs-on, onlye2e-testhaspostgresservice, onlybuild-releaseis main-gated)build.rsanywhere in the workspace (cargo build --releaseneeds no env vars)integration_teststarget lives incrates/inference_providers/tests/and usesMockProviderby default.config/nextest.tomlalready caps e2e concurrency at 16 threads (justifies removing the PG restart step)bin/mold+bin/ld.mold)Follow-up issues (not in this PR)
nextest --partition hash i/N(~3× e2e speedup) — needs a separate runner label or matrix to be effective.gpu11prod-host contention (the root cause of CI-duration variability).cargo test --no-run), refreshed onCargo.lockchanges.Follow-up issues