Skip to content

CI: pool the per-config runner matrices into parallel make-check jobs#10667

Open
julek-wolfssl wants to merge 6 commits into
wolfSSL:masterfrom
julek-wolfssl:ci-parallel-make-check
Open

CI: pool the per-config runner matrices into parallel make-check jobs#10667
julek-wolfssl wants to merge 6 commits into
wolfSSL:masterfrom
julek-wolfssl:ci-parallel-make-check

Conversation

@julek-wolfssl

@julek-wolfssl julek-wolfssl commented Jun 11, 2026

Copy link
Copy Markdown
Member

Replace the one-runner-per-configuration matrices across the
make-check workflow family with a generic pooled runner,
.github/scripts/parallel-make-check.py. Each workflow keeps its
configuration list as JSON next to the invocation; one runner (or a
small fixed set of shards, balanced by measured per-config minutes)
builds every config in its own out-of-tree (VPATH) build directory off
a single checkout/autogen, on a pool of one-per-CPU worker threads,
longest first. Concurrent checks are isolated with bubblewrap network
namespaces, compilations are cached with ccache, the first failure
aborts the rest (fail-fast, with --no-fail-fast to run everything),
and per-config timings plus pool efficiency land in the step summary.
Failure logs upload as artifacts. smoke-test.yml is likewise reworked
into a single pooled job that runs its nine configs on one runner.

Converted workflows (runner jobs per full pass):
os-check.yml 101 -> 8 (92 Ubuntu configs -> 4 shards;
the macOS matrix, the user-settings jobs and
the standalone
macos-apple-native-cert-validation.yml fold
into one macOS runner; Windows unchanged)
pq-all.yml 21 -> 2 shards
disable-pk-algs.yml 15 -> 1
wolfCrypt-Wconversion.yml 11 -> 1
trackmemory.yml 7 -> 1
cryptocb-only.yml 8 -> 1 (incl. the two new SHA512 entries)
multi-compiler.yml 6 -> 1
smallStackSize.yml 6 -> 1
multi-arch.yml 6 -> 1
async.yml 5 -> 1
psk.yml 5 -> 1
no-malloc.yml 3 -> 1
wolfsm.yml 3 -> 1
opensslcoexist.yml 2 -> 1

Measured against current upstream passing runs (job execution time,
queue excluded): ~200 runner jobs / ~374 runner-minutes per full pass
become 23 jobs / ~168 runner-minutes, with more coverage than before.
multi-arch's old matrix combined an "include" list of four
architectures with an "opts" axis; GitHub's include-merge rules made
each arch entry overwrite the previous one, so only the armel
combinations actually ran. The pooled list restores the intended
aarch64/armhf/riscv64 coverage (23 combinations; riscv64 x sp-math is
omitted as invalid - configure rejects sp-math without SP, and
--enable-riscv-asm, unlike --enable-sp-asm, does not bring SP in).

Out-of-tree build fixes this depends on:

  • Makefile.am: symlink the read-only test data (certs/, tests/ config
    files, sniffer captures and helpers, examples/crypto_policies,
    input, quit) into the build tree via a BUILT_SOURCES stamp, removed
    again in distclean-local. ChangeToWolfRoot() and the script tests
    resolve everything relative to the working directory, so out-of-tree
    make check and make distcheck now pass.
  • scripts/multi-msg-record.py: locate the client binary from the build
    tree working directory rather than the script's source directory.
  • configure.ac + wolfssl/include.am: run
    support/gen-debug-trace-error-codes.sh from $srcdir; it reads the
    error-code headers from the source tree and generates into the build
    tree.
  • tests/swdev: a WOLFBUILD variable points the sub-make at the build
    tree for the configure-generated headers (wolfssl/options.h,
    wolfssl/version.h); the in-tree-only guards are dropped.

Portions of PR #10649 are incorporated: the cross-platform
ccache-setup composite action, repository_owner gates on check-headers
and check-source-text, the docs-only paths-ignore on os-check, and the
libspdm timeout bumps.

Replace the one-runner-per-configuration matrices across the
make-check workflow family with a generic pooled runner,
.github/scripts/parallel-make-check.py. Each workflow keeps its
configuration list as JSON next to the invocation; one runner (or a
small fixed set of shards, balanced by measured per-config minutes)
builds every config in its own out-of-tree (VPATH) build directory off
a single checkout/autogen, on a pool of one-per-CPU worker threads,
longest first. Concurrent checks are isolated with bubblewrap network
namespaces, compilations are cached with ccache, the first failure
aborts the rest (fail-fast, with --no-fail-fast to run everything),
and per-config timings plus pool efficiency land in the step summary.
Failure logs upload as artifacts. smoke-test.yml is likewise reworked
into a single pooled job that runs its nine configs on one runner.

Converted workflows (runner jobs per full pass):
  os-check.yml             101 -> 8  (92 Ubuntu configs -> 4 shards;
                           the macOS matrix, the user-settings jobs and
                           the standalone
                           macos-apple-native-cert-validation.yml fold
                           into one macOS runner; Windows unchanged)
  pq-all.yml                21 -> 2 shards
  disable-pk-algs.yml       15 -> 1
  wolfCrypt-Wconversion.yml 11 -> 1
  trackmemory.yml            7 -> 1
  cryptocb-only.yml          8 -> 1  (incl. the two new SHA512 entries)
  multi-compiler.yml         6 -> 1
  smallStackSize.yml         6 -> 1
  multi-arch.yml             6 -> 1
  async.yml                  5 -> 1
  psk.yml                    5 -> 1
  no-malloc.yml              3 -> 1
  wolfsm.yml                 3 -> 1
  opensslcoexist.yml         2 -> 1

Measured against current upstream passing runs (job execution time,
queue excluded): ~200 runner jobs / ~374 runner-minutes per full pass
become 23 jobs / ~168 runner-minutes, with more coverage than before.
multi-arch's old matrix combined an "include" list of four
architectures with an "opts" axis; GitHub's include-merge rules made
each arch entry overwrite the previous one, so only the armel
combinations actually ran. The pooled list restores the intended
aarch64/armhf/riscv64 coverage (23 combinations; riscv64 x sp-math is
omitted as invalid - configure rejects sp-math without SP, and
--enable-riscv-asm, unlike --enable-sp-asm, does not bring SP in).

Out-of-tree build fixes this depends on:
- Makefile.am: symlink the read-only test data (certs/, tests/ config
  files, sniffer captures and helpers, examples/crypto_policies,
  input, quit) into the build tree via a BUILT_SOURCES stamp, removed
  again in distclean-local. ChangeToWolfRoot() and the script tests
  resolve everything relative to the working directory, so out-of-tree
  make check and make distcheck now pass.
- scripts/multi-msg-record.py: locate the client binary from the build
  tree working directory rather than the script's source directory.
- configure.ac + wolfssl/include.am: run
  support/gen-debug-trace-error-codes.sh from $srcdir; it reads the
  error-code headers from the source tree and generates into the build
  tree.
- tests/swdev: a WOLFBUILD variable points the sub-make at the build
  tree for the configure-generated headers (wolfssl/options.h,
  wolfssl/version.h); the in-tree-only guards are dropped.

Portions of PR wolfSSL#10649 are incorporated: the cross-platform
ccache-setup composite action, repository_owner gates on check-headers
and check-source-text, the docs-only paths-ignore on os-check, and the
libspdm timeout bumps.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures the GitHub Actions “make check” workflow family to run many per-configuration builds on a smaller pooled set of runners, driven by a new generic runner script. To enable this, it also fixes multiple out-of-tree (VPATH) build/test path assumptions across the autotools build, scripts, and swdev test integration, and expands ccache usage to reduce rebuild time.

Changes:

  • Add .github/scripts/parallel-make-check.py to build/test many configs concurrently (or sharded) from JSON config lists, with fail-fast, per-config timing, summaries, and log artifacts.
  • Update autotools + ancillary scripts to support out-of-tree make check (symlink test data into build tree; ensure generated headers and binaries are found correctly).
  • Convert numerous CI workflows (including smoke-test and os-check) from large matrices to pooled/sharded pooled runs, plus ccache-setup improvements and timeouts.

Reviewed changes

Copilot reviewed 30 out of 31 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
wolfssl/include.am Run debug-trace error-code header generator from $(top_srcdir) for VPATH builds.
wolfcrypt/test/include.am Allow swdev to build out-of-tree by passing WOLFBUILD instead of failing VPATH builds.
tests/swdev/README.md Update swdev docs to reflect VPATH support via WOLFBUILD.
tests/swdev/Makefile Add WOLFBUILD include path and prerequisite targeting build-tree generated headers.
support/gen-debug-trace-error-codes.sh Read input headers from $srcdir and generate outputs into the build tree.
scripts/multi-msg-record.py Locate the examples/client/client binary from the build-tree working directory for VPATH checks.
Makefile.am Add VPATH make check support via a BUILT_SOURCES stamp that symlinks required test data into the build tree and cleans it on distclean.
configure.ac Remove in-tree-only swdev guard; run debug-trace header generation from $srcdir.
.gitignore Ignore the new stamp file and pooled build dirs (/build-*).
.github/workflows/wolfsm.yml Convert to pooled parallel make-check runner with ccache + artifact logs.
.github/workflows/wolfCrypt-Wconversion.yml Convert warning-check builds to pooled runner with JSON configs and ccache + artifact logs.
.github/workflows/trackmemory.yml Convert to pooled runner with bwrap, ccache, and JSON config list.
.github/workflows/smoke-test.yml Convert smoke matrix to a single pooled job using the new script, with ccache and bwrap notes.
.github/workflows/smallStackSize.yml Convert to pooled runner with explicit run of testwolfcrypt per config.
.github/workflows/psk.yml Convert to pooled runner with per-config JSON list and ccache + artifact logs.
.github/workflows/pq-all.yml Convert to 2-shard pooled runner; shard-balanced by measured minutes; ccache + artifact logs per shard.
.github/workflows/os-check.yml Convert Ubuntu configs to 4 pooled shards + single pooled macOS job; restore intended multi-arch coverage; add docs-only paths-ignore.
.github/workflows/opensslcoexist.yml Convert to pooled runner with JSON configs and ccache + artifact logs.
.github/workflows/no-malloc.yml Convert to pooled runner and preserve “build + testwolfcrypt only” behavior via check: false + run.
.github/workflows/multi-compiler.yml Convert to pooled runner building multiple compilers in one job using per-config compiler selection.
.github/workflows/multi-arch.yml Convert to pooled runner restoring intended multi-arch × opts coverage; run testwolfcrypt under qemu-user per config.
.github/workflows/macos-apple-native-cert-validation.yml Remove standalone workflow; folded into os-check macOS pooled config list.
.github/workflows/libspdm.yml Increase timeouts to reduce flaky cancellations on loaded runners.
.github/workflows/disable-pk-algs.yml Convert to pooled runner with JSON configs and ccache + artifact logs.
.github/workflows/cryptocb-only.yml Convert to pooled runner; inline former BASE_CONFIG into each JSON entry; add artifact logs.
.github/workflows/check-source-text.yml Gate execution to wolfssl org to avoid fork CI spend.
.github/workflows/check-headers.yml Gate execution to wolfssl org to avoid fork CI spend.
.github/workflows/async.yml Convert to pooled runner with JSON configs and ccache + artifact logs.
.github/scripts/parallel-make-check.py New pooled runner implementation: JSON schema, sharding, parallel workers, fail-fast killing, summaries, and log capture.
.github/actions/wait-for-smoke/action.yml Increase default smoke wait timeout to 3600s.
.github/actions/ccache-setup/action.yml Make ccache install cross-platform and set PATH for compiler wrappers on Linux/macOS.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/psk.yml
Comment thread Makefile.am
Comment thread .github/scripts/parallel-make-check.py
Comment thread .github/scripts/parallel-make-check.py
Comment thread .github/scripts/parallel-make-check.py
Comment thread .github/workflows/disable-pk-algs.yml
Comment thread Makefile.am Outdated
Address the Copilot review:
- parallel-make-check.py: validate "configure" (list of strings) and
  cflags/ldflags (strings) so a malformed entry fails the load instead
  of exploding a string into per-character configure arguments; print
  a single line for passing configs instead of dumping their full
  make-check.log into the CI log (failure dumps unchanged; the logs
  remain in build-<name>/ for the failure artifacts).
- Makefile.am: use rm -rf for the certs/input/quit setup and distclean
  cleanup. A --private-dir run replaces the certs symlink with a
  private directory copy that rm -f cannot remove (verified: make
  distclean in a build dir with a privatized certs/ now succeeds and
  removes it).
- psk.yml, disable-pk-algs.yml: normalize the single-dash tokens
  (-disable-rsa, -disable-ecc, -disable-aescbc, -enable-cryptonly)
  carried verbatim from the old matrices to the canonical double-dash
  form. No coverage change: configure honors single-dash spellings
  (verified -disable-rsa sets NO_RSA with no unrecognized-option
  warning), so these were always in effect; both touched configs
  re-validated end-to-end.

The --cc default stays "ccache gcc": ccache resolves the compiler
through its own masquerade symlinks (verified: no recursion and normal
cache hits with /usr/lib/ccache prepended to PATH), and the explicit
CC= also covers jobs that use ccache without the PATH masquerade.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 31 changed files in this pull request and generated 3 comments.

Comment thread .github/scripts/parallel-make-check.py Outdated
Comment thread .github/scripts/parallel-make-check.py
Comment thread .github/scripts/parallel-make-check.py
Two fixes from the second Copilot review round:

A process spawned between abort_others()' live_procs snapshot and its
registration escaped the kill sweep, leaving that build/check running
to completion after fail-fast had begun. Re-check stop_event right
after registering the process and SIGTERM its process group if the
abort already started: either the registration happened before the
sweep's snapshot (the sweep kills it) or it happened after stop_event
was set (the re-check sees it), so the window is closed.

Exceptions from callable steps (user_settings staging, private-dir
copies) used to escape the worker thread and crash the whole script
with no summary. They are now recorded as that config's step failure
with the exception written to its make-check.log, e.g. a bad
"user_settings" path reports FAIL (stage <path>) while the other
configs keep running; the fail-fast bookkeeping is shared with the
nonzero-exit path via record_failure().
@julek-wolfssl julek-wolfssl requested a review from Copilot June 11, 2026 20:36

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 31 changed files in this pull request and generated 3 comments.

Comment thread Makefile.am
Comment thread .github/workflows/smoke-test.yml
Comment thread .github/scripts/parallel-make-check.py Outdated
Third Copilot review round:

- Makefile.am: run the test-data stamp recipe body under set -e. A
  failed symlink mid-loop previously did not fail the compound command
  (only the last command's status counted), so a partially-populated
  build tree could be stamped complete. Now any failed setup command
  aborts the recipe and the stamp is not created.

- parallel-make-check.py: fail-fast sent SIGTERM only, so a test that
  traps or ignores SIGTERM could keep the job alive until the workflow
  timeout. abort_others() now polls the swept processes and SIGKILLs
  whatever is still alive after a 10 s grace period, and the
  post-registration race-window kill escalates the same way (bounded
  wait, then SIGKILL). Verified with a config running
  "trap '' TERM; sleep 300": the run completes in ~10 s with the
  stubborn config reported as aborted and no surviving processes.
The two jobs that manage their ccache cache manually rely on ccache's
XDG default (~/.cache/ccache) matching the actions/cache path. That
holds today, but nothing enforces it: a later change that sets
CCACHE_DIR (e.g. adopting the ccache-setup composite, which uses
~/.ccache) would silently decouple the build's cache from the
saved/restored directory. Pin CCACHE_DIR explicitly to the cached
path so the pairing is visible and cannot drift.
wolfSSL's configure enables make's jobserver by default
(AX_AM_JOBSERVER([yes]) -> AM_MAKEFLAGS += -j<nproc+1> in aminclude.am),
and automake passes that explicit -j to every recursive sub-make, where
it overrides the invoking make's job limit. The script's -j therefore
only ever scheduled the outermost recursion hop: --jobs was inert.

Measured on a 4-CPU host with 10 build-only configs oversaturating the
worker pool, the jobserver default is also the better policy: capping
sub-makes via --disable-jobserver and -j2 dropped CPU utilization from
96% to 89% and lengthened the wall time, because configs' serial
phases (configure, link) stopped being backfilled by other configs'
compile jobs. So make is now invoked with no -j at all - parallelism
within a config comes from the configure-default jobserver - and the
misleading knob is gone, including the macOS job's --jobs 3.
@julek-wolfssl julek-wolfssl marked this pull request as ready for review June 11, 2026 23:43
@github-actions

Copy link
Copy Markdown

retest this please

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown

MemBrowse Memory Report

gcc-arm-cortex-m3

  • FLASH: .text +112 B (+0.1%, 121,393 B / 262,144 B, total: 46% used)

gcc-arm-cortex-m4

  • FLASH: .rodata.wolfSSL_ERR_reason_error_string.str1.1 +34 B, .text +128 B (+0.1%, 199,056 B / 262,144 B, total: 76% used)

gcc-arm-cortex-m4-dtls13

  • FLASH: .text +64 B (+0.0%, 179,736 B / 1,048,576 B, total: 17% used)

gcc-arm-cortex-m4-openssl-compat

  • FLASH: .rodata +64 B, .text +576 B (+0.1%, 767,476 B / 1,048,576 B, total: 73% used)

gcc-arm-cortex-m4-pq

  • FLASH: .rodata +32 B, .text +256 B (+0.1%, 277,880 B / 1,048,576 B, total: 27% used)

gcc-arm-cortex-m4-rsa-only

  • FLASH: .rodata +32 B, .text +448 B (+0.1%, 323,480 B / 1,048,576 B, total: 31% used)

gcc-arm-cortex-m4-tls12

  • FLASH: .text +64 B (+0.1%, 122,125 B / 262,144 B, total: 47% used)

gcc-arm-cortex-m4-tls13

  • FLASH: .rodata.wolfSSL_ERR_reason_error_string.str1.1 +34 B, .text +128 B (+0.1%, 234,690 B / 262,144 B, total: 90% used)

gcc-arm-cortex-m7

  • FLASH: .rodata.wolfSSL_ERR_reason_error_string.str1.1 +34 B, .text +128 B (+0.1%, 199,056 B / 262,144 B, total: 76% used)

gcc-arm-cortex-m7-pq

  • FLASH: .rodata +32 B, .text +256 B (+0.1%, 278,456 B / 1,048,576 B, total: 27% used)

gcc-arm-cortex-m7-tls13

  • FLASH: .rodata.wolfSSL_ERR_reason_error_string.str1.1 +34 B, .text +128 B (+0.1%, 234,690 B / 262,144 B, total: 90% used)

linuxkm-pie

  • Data: __patchable_function_entries +8 B (+0.0%, 24,272 B)

linuxkm-standard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants