Build test_suites concurrently in local_build.sh with a capped pool#510
Merged
copybara-service[bot] merged 1 commit intoJun 4, 2026
Merged
Conversation
Default cap is (nproc-1)/3 (floor 1), matching the ~3 hot worker processes each `go test -c` spawns (compile, vet, link). Pass -j N to override; -j 0 restores the auto formula. A FIFO semaphore throttles to $jobs concurrent suites; each suite still builds its 4 GOOS/GOARCH variants serially inside the slot. On a 32-core host this cuts a cold all-suites build from ~5m29s to ~2m30s (-54%) and self-throttles cleanly on smaller hardware (4 cores → K=1, effectively serial).
|
Hi @jonathanspw. Thanks for your PR. I'm waiting for a GoogleCloudPlatform member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Contributor
|
/ok-to-test |
drewhli
approved these changes
Jun 4, 2026
copybara-service Bot
pushed a commit
that referenced
this pull request
Jun 4, 2026
-- 4eb3f49 by Jonathan Wright <jonathan@almalinux.org>: Build test_suites concurrently in local_build.sh with a capped pool Default cap is (nproc-1)/3 (floor 1), matching the ~3 hot worker processes each `go test -c` spawns (compile, vet, link). Pass -j N to override; -j 0 restores the auto formula. A FIFO semaphore throttles to $jobs concurrent suites; each suite still builds its 4 GOOS/GOARCH variants serially inside the slot. On a 32-core host this cuts a cold all-suites build from ~5m29s to ~2m30s (-54%) and self-throttles cleanly on smaller hardware (4 cores → K=1, effectively serial). FUTURE_COPYBARA_INTEGRATE_REVIEW=#510 from jonathanspw:local_build_parallel 4eb3f49 PiperOrigin-RevId: 926948257
d123233
into
GoogleCloudPlatform:main
10 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Supersedes #337
Summary
Parallelizes the test-suite builds in
local_build.shwith a self-tuningconcurrency cap. Cold full-suite builds drop from ~5m29s to ~2m30s on a
32-core host — a 54% wall-clock reduction — with safe self-throttling on
smaller hardware.
Problem
local_build.shbuilds 41 test suites × 4 GOOS/GOARCH variants each in asingle serial loop. On a 32-core/124-GiB host the average CPU usage during
a build is ~4 cores out of 32 — most of the machine sits idle. Locally
this is just slow; in CI it directly blocks every presubmit job that
rebuilds the test binaries.
Naively parallelizing (launch all 41 suites concurrently) is slower than
it should be: 41 simultaneous
go test -cinvocations race to compile thesame shared dependencies (the imagetest framework,
compute-daisy, theGCE compute APIs, stdlib). Go's build cache is filesystem-locked so the
work is safe, but only one winner gets cached per package; the other
~40 simultaneous compiles of the same dep are pure waste. Measured: user
CPU climbs to ~6000s with only a modest wall-clock improvement (4m36s),
and on memory-constrained hosts the process tree can OOM.
Changes
-j Nflag onlocal_build.sh.-j 0(default) means auto:max(1, (nproc-1)/3). The(nproc-1)/3formula matches theobservation that each
go test -cspawns ~3 hot worker processes(compile, vet, link), so a cap of
K*3 ≤ cores-1keeps the workercount close to the core count without leaving the system starved.
$jobsconcurrent suites. Each suite still builds its 4 arch variantsserially inside its slot, so a hung suite can't starve more than one
slot.
build_suite()function — same commands, same outputs, just callable.
the script reports which suite(s) failed and exits non-zero.
Benchmark
Cold cache, 41 suites × 4 archs = 164 build outputs. Single trial each on
a 32-core / 124 GiB host. All runs produce identical 210 artifacts.
Per-suite walls in the K=10 run are 8–52 seconds: the first wave of
suites starts cold and pays for the shared compile, the rest finish in
seconds against warm cache. User CPU drops 2.3× compared to the
unlimited variant — most of the previously wasted dup-work is gone.
Behavior on smaller hardware
The formula degrades gracefully:
K-j Nis honored verbatim for environments that want explicit control(CI runners with a known core/memory budget).
-j 0resets to auto.