Skip to content

Build test_suites concurrently in local_build.sh with a capped pool#510

Merged
copybara-service[bot] merged 1 commit into
GoogleCloudPlatform:mainfrom
jonathanspw:local_build_parallel
Jun 4, 2026
Merged

Build test_suites concurrently in local_build.sh with a capped pool#510
copybara-service[bot] merged 1 commit into
GoogleCloudPlatform:mainfrom
jonathanspw:local_build_parallel

Conversation

@jonathanspw

Copy link
Copy Markdown
Contributor

Supersedes #337

Summary

Parallelizes the test-suite builds in local_build.sh with a self-tuning
concurrency cap. Cold full-suite builds drop from ~5m29s to ~2m30s on a
32-core host — a 54% wall-clock reduction — with safe self-throttling on
smaller hardware.

Problem

local_build.sh builds 41 test suites × 4 GOOS/GOARCH variants each in a
single serial loop. On a 32-core/124-GiB host the average CPU usage during
a build is ~4 cores out of 32 — most of the machine sits idle. Locally
this is just slow; in CI it directly blocks every presubmit job that
rebuilds the test binaries.

Naively parallelizing (launch all 41 suites concurrently) is slower than
it should be: 41 simultaneous go test -c invocations race to compile the
same shared dependencies (the imagetest framework, compute-daisy, the
GCE compute APIs, stdlib). Go's build cache is filesystem-locked so the
work is safe, but only one winner gets cached per package; the other
~40 simultaneous compiles of the same dep are pure waste. Measured: user
CPU climbs to ~6000s with only a modest wall-clock improvement (4m36s),
and on memory-constrained hosts the process tree can OOM.

Changes

  • New -j N flag on local_build.sh. -j 0 (default) means auto:
    max(1, (nproc-1)/3). The (nproc-1)/3 formula matches the
    observation that each go test -c spawns ~3 hot worker processes
    (compile, vet, link), so a cap of K*3 ≤ cores-1 keeps the worker
    count close to the core count without leaving the system starved.
  • A FIFO-based semaphore throttles the per-suite background subshells to
    $jobs concurrent suites. Each suite still builds its 4 arch variants
    serially inside its slot, so a hung suite can't starve more than one
    slot.
  • The per-suite build commands are hoisted into a build_suite()
    function — same commands, same outputs, just callable.
  • Failure tracking: per-suite exit codes are collected; on any failure
    the script reports which suite(s) failed and exits non-zero.

Benchmark

Cold cache, 41 suites × 4 archs = 164 build outputs. Single trial each on
a 32-core / 124 GiB host. All runs produce identical 210 artifacts.

Variant Wall clock User CPU Effective cores Notes
Serial (today) 5m29s 1180s ~4.3 baseline
Parallel unlimited (41 jobs) 4m36s 5947s ~25 heavy dup compile, OOM risk
Parallel K=10 (default on 32 cores) 2m30s 2601s ~20 shipped default
Parallel K=16 (cores/2) 2m54s 3258s ~22 enough oversubscription to lose

Per-suite walls in the K=10 run are 8–52 seconds: the first wave of
suites starts cold and pays for the shared compile, the rest finish in
seconds against warm cache. User CPU drops 2.3× compared to the
unlimited variant — most of the previously wasted dup-work is gone.

Behavior on smaller hardware

The formula degrades gracefully:

Cores Default K
4 1 (serial)
8 2
16 5
32 10
64 21

-j N is honored verbatim for environments that want explicit control
(CI runners with a known core/memory budget). -j 0 resets to auto.

Default cap is (nproc-1)/3 (floor 1), matching the ~3 hot worker
processes each `go test -c` spawns (compile, vet, link). Pass -j N to
override; -j 0 restores the auto formula.

A FIFO semaphore throttles to $jobs concurrent suites; each suite still
builds its 4 GOOS/GOARCH variants serially inside the slot. On a
32-core host this cuts a cold all-suites build from ~5m29s to ~2m30s
(-54%) and self-throttles cleanly on smaller hardware (4 cores → K=1,
effectively serial).
@google-oss-prow

Copy link
Copy Markdown

Hi @jonathanspw. Thanks for your PR.

I'm waiting for a GoogleCloudPlatform member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@shenpai35

Copy link
Copy Markdown
Contributor

/ok-to-test

copybara-service Bot pushed a commit that referenced this pull request Jun 4, 2026
--
4eb3f49 by Jonathan Wright <jonathan@almalinux.org>:

Build test_suites concurrently in local_build.sh with a capped pool

Default cap is (nproc-1)/3 (floor 1), matching the ~3 hot worker
processes each `go test -c` spawns (compile, vet, link). Pass -j N to
override; -j 0 restores the auto formula.

A FIFO semaphore throttles to $jobs concurrent suites; each suite still
builds its 4 GOOS/GOARCH variants serially inside the slot. On a
32-core host this cuts a cold all-suites build from ~5m29s to ~2m30s
(-54%) and self-throttles cleanly on smaller hardware (4 cores → K=1,
effectively serial).

FUTURE_COPYBARA_INTEGRATE_REVIEW=#510 from jonathanspw:local_build_parallel 4eb3f49
PiperOrigin-RevId: 926948257
@copybara-service copybara-service Bot merged commit d123233 into GoogleCloudPlatform:main Jun 4, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants