Skip to content

ci(workflows): wire ccache and caches across SITL, ROS, macOS, Ubuntu#27036

Merged
mrpollo merged 8 commits intomainfrom
mrpollo/ci-caches-everywhere
Apr 10, 2026
Merged

ci(workflows): wire ccache and caches across SITL, ROS, macOS, Ubuntu#27036
mrpollo merged 8 commits intomainfrom
mrpollo/ci-caches-everywhere

Conversation

@mrpollo
Copy link
Copy Markdown
Contributor

@mrpollo mrpollo commented Apr 9, 2026

Ports cache wiring from the CI orchestrator branch (mrpollo/ci_orchestration, #26257) into current mainline workflows. Each change matches the corresponding job in ci-orchestrator.yml one to one. This is the big one in the ongoing rollout, touching eight workflows.

clang-tidy.yml gets a single-line bump: ccache max-size goes from 120M to 150M to match CACHE_CLANG_TIDY.

itcm_check.yml gets wired to the setup-ccache / save-ccache composite actions with per-target cache prefixes (ccache-itcm-${{ matrix.target }}, 200M).

failsafe_sim.yml gets a dedicated cache for the emsdk clone at key emsdk-4.0.15 with the install gated on cache miss. Saves about 30s on every run.

compile_ubuntu.yml splits the install + build step so setup-ccache can run between ./Tools/setup/ubuntu.sh (which is what installs ccache in the plain ubuntu:22.04 / ubuntu:24.04 base images) and make quick_check. Per-matrix cache prefixes keyed on the container version (200M).

compile_macos.yml gets three caches: a Homebrew downloads cache keyed on the macos.sh hash, a pip downloads cache keyed on the requirements.txt hash, and the setup-ccache / save-ccache composites replacing the hand-rolled ccache block. ccache size goes from 40M to 200M.

sitl_tests.yml replaces the hand-rolled ccache block with setup-ccache / save-ccache (prefix ccache-sitl-gazebo-classic, 120M to 350M) and swaps the explicit make px4_sitl_default + make sitl_gazebo-classic invocations for the build-gazebo-sitl composite action. PX4_CMAKE_BUILD_TYPE and PX4_SBOM_DISABLE are hoisted to job-level env so they propagate into the composite's child steps. One deviation from the orchestrator: save-ccache is added here even though orchestrator's sitl-tests job does not save. Orchestrator relies on an upstream build-sitl-gazebo-classic seeder job, but mainline has no such parent, so without save-ccache the cache would never populate.

ros_integration_tests.yml gets the biggest structural change. The hand-rolled ccache is replaced with the composites (prefix ccache-ros-integration, 300M to 400M). A dedicated cache is added for the Micro-XRCE-DDS Agent build at key xrce-agent-v2.2.1-fastdds-2.8.2-galactic-2021-09-08 with the build gated on cache miss. Another dedicated cache covers the px4-ros2-interface-lib colcon workspace keyed on the hash of msg/*.msg, msg/versioned/*.msg, and srv/*.srv files so it rebuilds only when the uORB interface changes. The explicit PX4 and Gazebo make calls swap to the build-gazebo-sitl composite. PX4_SBOM_DISABLE is hoisted to job-level env.

flash_analysis.yml gets a two-phase raw ccache wiring: the "current build" (PR head) is wrapped with actions/cache/restore@v4 + actions/cache/save@v4 keyed on ref_name + sha, and the "baseline build" (base branch or previous commit) gets a second restore/save pair keyed on the baseline sha. A ccache -C between the two builds ensures the baseline starts cold. This one cannot use the composite actions because the job needs two independent cache lifecycles in a single run. While touching this file, the indentation inside the PR_COMMENT_BODY_EOF heredoc is also fixed: the <details> children were indented two spaces, which GitHub markdown parses as an indented code block and renders the collapsible section as literal text. Flushing them to the heredoc's left edge restores the intended rendering.

mrpollo added 7 commits April 9, 2026 15:51
Port the cache wiring from the CI orchestrator branch
(mrpollo/ci_orchestration, PR #26257) into current mainline workflows
without merging the orchestrator itself. Each change matches the
corresponding job in ci-orchestrator.yml one to one.

clang-tidy.yml:
- Bump setup-ccache max-size 120M to 150M to match CACHE_CLANG_TIDY.

itcm_check.yml:
- Wire setup-ccache / save-ccache (cache-key-prefix
  ccache-itcm-${{ matrix.target }}, max-size 200M).

failsafe_sim.yml:
- Cache the emsdk clone at key emsdk-4.0.15 and gate the install on
  cache miss. Saves about 30s per run.

compile_ubuntu.yml:
- Split the install + build step so setup-ccache can run between
  ubuntu.sh and make quick_check (ubuntu.sh is what installs ccache
  in the plain ubuntu:22.04 and ubuntu:24.04 base images).
- Wire setup-ccache / save-ccache (cache-key-prefix
  ccache-ubuntu-${{ matrix.version }}, max-size 200M).

compile_macos.yml:
- Add Homebrew downloads cache keyed on the macos.sh hash.
- Add pip downloads cache keyed on the requirements.txt hash.
- Replace the hand-rolled ccache block with setup-ccache /
  save-ccache (cache-key-prefix ccache-macos-${{ matrix.config }},
  max-size 40M to 200M).

sitl_tests.yml:
- Replace hand-rolled ccache with setup-ccache / save-ccache
  (cache-key-prefix ccache-sitl-gazebo-classic, max-size 120M to
  350M).
- Replace the explicit make px4_sitl_default + make sitl_gazebo-classic
  invocations with the build-gazebo-sitl composite action.
- Hoist PX4_CMAKE_BUILD_TYPE and PX4_SBOM_DISABLE to job-level env
  so they propagate into composite action steps.
- save-ccache is added here even though the orchestrator does not
  save in sitl-tests. The orchestrator relies on an upstream
  build-sitl-gazebo-classic seeder job; mainline has no such parent,
  so without save-ccache the cache would never populate.

ros_integration_tests.yml:
- Replace hand-rolled ccache with setup-ccache / save-ccache
  (cache-key-prefix ccache-ros-integration, max-size 300M to 400M).
- Add a dedicated cache for the Micro-XRCE-DDS Agent build at key
  xrce-agent-v2.2.1-fastdds-2.8.2-galactic-2021-09-08, gating the
  build on cache miss.
- Add a dedicated cache for the px4-ros2-interface-lib colcon
  workspace keyed on the hash of msg/*.msg, msg/versioned/*.msg, and
  srv/*.srv files so it rebuilds only when the interface changes.
- Replace the explicit make invocations with the build-gazebo-sitl
  composite action.
- Hoist PX4_SBOM_DISABLE to job-level env for composite propagation.

flash_analysis.yml:
- Wrap the "current build" (PR head) with a restore/save pair of raw
  actions/cache/restore@v4 and actions/cache/save@v4 actions, keyed
  on ref_name + sha with a ref_name fallback. Uses the same ccache
  configuration (base_dir, compression, compression_level, max_size
  200M, hash_dir false, compiler_check content) the composite
  setup-ccache action uses.
- Wrap the "baseline build" (base branch or previous commit) with a
  second restore/save pair keyed on the baseline sha. ccache -C runs
  between the two builds to ensure a cold cache for the baseline.
- This cannot use the composite actions because the job needs two
  independent cache lifecycles in a single run; setup-ccache is
  single-lifecycle.
- Fix the markdown indentation in the PR comment body heredoc. The
  <details> children were indented two spaces, which GitHub markdown
  parses as an indented code block and renders the collapsible
  section as literal text. Flushing the children to the left edge of
  the heredoc makes the <details> render as intended.

Signed-off-by: Ramon Roche <mrpollo@gmail.com>
Replace the two-entry matrix (px4_fmu-v5_default, px4_sitl) with a
single job that builds both targets sequentially. This matches the
orchestrator's macos-build job which runs both targets in sequence to
share a single ccache key (ccache-macos, max-size 200M).

The matrix approach ran two parallel jobs that raced on cache save
because both used the same setup-ccache / save-ccache lifecycle. With
the per-matrix suffix workaround (ccache-macos-${{ matrix.config }})
each target had its own cache but could not share compilation units
across targets. The sequential approach lets the second target (the
NuttX cross-compile) benefit from headers and shared objects already
cached by the first (the native SITL build).

Signed-off-by: Ramon Roche <mrpollo@gmail.com>
…ed jobs

extras=s3-cache on a RunsOn runner label only takes effect when the
runs-on/action step is also present in the job. It hooks into the
cache backend and routes actions/cache calls through RunsOn's S3
proxy instead of the default GitHub Actions cache backend. Without
the action step the label extra is a no-op and the cache transfers
still go through the slower default backend.

Six workflows were missing either the extras flag, the action step,
or both:

- sitl_tests.yml: add runs-on/action@v2 step (label already had
  extras=s3-cache from PR #27034).
- ros_integration_tests.yml: add runs-on/action@v2 step (label
  already had extras=s3-cache from PR #27034).
- itcm_check.yml: add extras=s3-cache to the runner label and add
  runs-on/action@v2 step.
- flash_analysis.yml: add extras=s3-cache to the analyze_flash job
  label and add runs-on/action@v2 step. The post_pr_comment utility
  job (1cpu) is left alone to match the orchestrator's rule that
  utility runners skip s3-cache.
- compile_ubuntu.yml: add extras=s3-cache to the runner label and
  add runs-on/action@v2 step as the first step before the in-
  container git install.
- ros_translation_node.yml: add extras=s3-cache to the runner label
  and add runs-on/action@v2 step before the setup-ros action.

build_all_targets.yml is deliberately left alone. Its matrix builds
already have runs-on/action@v2 but no ccache today, so adding
extras=s3-cache there would be a no-op until a later PR wires ccache
into those builds.

Matches the orchestrator's RUNNER_SMALL/RUNNER_MEDIUM/RUNNER_LARGE
label definitions on mrpollo/ci_orchestration, which all include
extras=s3-cache, and matches the orchestrator convention of adding
runs-on/action@v2 as the first step of every RunsOn job.

Signed-off-by: Ramon Roche <mrpollo@gmail.com>
Move four GitHub-hosted jobs to RunsOn runners, matching the
orchestrator's label assignments on mrpollo/ci_orchestration.

- failsafe_sim.yml: ubuntu-latest to 4cpu-linux-x64 with
  extras=s3-cache (matches orchestrator failsafe-sim on
  runner_small).
- mavros_mission_tests.yml: ubuntu-latest to 4cpu-linux-x64 with
  extras=s3-cache (matches orchestrator mavros-tests on
  runner_small).
- mavros_offboard_tests.yml: ubuntu-latest to 4cpu-linux-x64 with
  extras=s3-cache (matches orchestrator mavros-tests on
  runner_small).
- python_checks.yml: ubuntu-24.04 to 1cpu-linux-x64 (matches
  orchestrator mavsdk-python-checks on runner_utility, which per
  orchestrator convention does not include extras=s3-cache since
  utility runners transfer no meaningful data).

Every job also gets the runs-on/action@v2 step as the first step of
the job, which is required for the extras=s3-cache flag to actually
route actions/cache traffic through the RunsOn S3 proxy. python_checks
also gets runs-on/action@v2 even without s3-cache to match the
orchestrator's convention of putting it on every RunsOn job.

The Checks, EKF Change Indicator, EKF Update Change Indicator,
Commit Quality, Labeler, CodeQL, and MacOS build workflows are left
alone. Checks is a mixed-bag matrix the plan explicitly forbade
splitting. EKF indicators and CodeQL have no direct orchestrator
equivalent. Commit Quality and Labeler are tiny metadata jobs where
migration adds no value. MacOS build still uses macos-latest in the
orchestrator too.

Signed-off-by: Ramon Roche <mrpollo@gmail.com>
Bump the four remaining runs-on/action@v1 references to @v2 in
docs_deploy.yml (1) and docs-orchestrator.yml (3). Every other
workflow in the repo already tracks @v2, which currently resolves
to v2.1.0. There is no v3 or newer; @v2 is the latest major.

Signed-off-by: Ramon Roche <mrpollo@gmail.com>
Split the single 11-entry checks build job (now 6 entries after
recent trim) into two jobs matched to their cost profile.

gate_checks (5 cheap entries) runs on 2cpu-linux-x64 with
extras=s3-cache and fail-fast: true. The entries are check_format,
check_newlines, validate_module_configs, shellcheck_all, and
module_documentation. Each runs in under 2 minutes, none build
firmware, none benefit from ccache. fail-fast: true is retained for
these since the 99% success rate holds and a failing gate check
should cancel the siblings to save runner time.

tests (the 1 expensive entry, make tests) runs on 8cpu-linux-x64
with extras=s3-cache as a single non-matrix job. It rebuilds
px4_sitl_default and runs the unit test suite, previously 9m41s on
free ubuntu-latest at 2 vCPUs. 8cpu was chosen to match the SITL
tests runner size landed in PR #27034 since this is the same
workload (SITL firmware build). Wired to the setup-ccache /
save-ccache composite actions with cache-key-prefix ccache-sitl
and max-size 300M, matching the orchestrator basic-tests job.

Both jobs use runs-on/action@v2 as the first step, which is
required for extras=s3-cache to route actions/cache traffic
through the RunsOn S3 proxy.

This is a deliberate, limited split of the checks.yml matrix. It
does not split every check into its own job; it groups by cost
tier (cheap gate checks vs expensive unit tests) so each group can
sit on an appropriately sized runner. The orchestrator splits
these into gate-checks T1 (runner_utility), shellcheck T1
(runner_utility), and basic-tests T2 (runner_small) across three
separate jobs; the two-job approach keeps the surface area smaller.

Signed-off-by: Ramon Roche <mrpollo@gmail.com>
Port the orchestrator's ros-translation-node job (L1277-1340 of
ci-orchestrator.yml) to mainline.

- Switch container from rostooling/setup-ros-docker to the official
  ros:<distro>-ros-base-<ubuntu> image. Drop the setup-ros@v0.7
  action since the official image already has the ROS distribution
  installed.
- Wire setup-ccache / save-ccache with cache-key-prefix
  ccache-ros-translation-${{ matrix.config.ros_version }}, max-size
  150M, base-dir /ros_ws, install-ccache true (the official ROS base
  image does not ship ccache).
- Add -DCMAKE_CXX_COMPILER_LAUNCHER=ccache and
  -DCMAKE_C_COMPILER_LAUNCHER=ccache to the colcon build invocation
  so cmake routes compilations through ccache.
- Split the combined build-and-test step into separate build and test
  steps for clearer log output.

Signed-off-by: Ramon Roche <mrpollo@gmail.com>
@github-actions
Copy link
Copy Markdown

No broken links found in changed files.

…slation_node

Add `permissions: contents: read` at the workflow level in checks.yml
and ros_translation_node.yml. CodeQL flagged both for not limiting
the GITHUB_TOKEN scope.

Signed-off-by: Ramon Roche <mrpollo@gmail.com>
@mrpollo mrpollo merged commit 5d5d9e3 into main Apr 10, 2026
76 checks passed
@mrpollo mrpollo deleted the mrpollo/ci-caches-everywhere branch April 10, 2026 04:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants