ci(workflows): wire ccache and caches across SITL, ROS, macOS, Ubuntu#27036
Merged
ci(workflows): wire ccache and caches across SITL, ROS, macOS, Ubuntu#27036
Conversation
Port the cache wiring from the CI orchestrator branch (mrpollo/ci_orchestration, PR #26257) into current mainline workflows without merging the orchestrator itself. Each change matches the corresponding job in ci-orchestrator.yml one to one. clang-tidy.yml: - Bump setup-ccache max-size 120M to 150M to match CACHE_CLANG_TIDY. itcm_check.yml: - Wire setup-ccache / save-ccache (cache-key-prefix ccache-itcm-${{ matrix.target }}, max-size 200M). failsafe_sim.yml: - Cache the emsdk clone at key emsdk-4.0.15 and gate the install on cache miss. Saves about 30s per run. compile_ubuntu.yml: - Split the install + build step so setup-ccache can run between ubuntu.sh and make quick_check (ubuntu.sh is what installs ccache in the plain ubuntu:22.04 and ubuntu:24.04 base images). - Wire setup-ccache / save-ccache (cache-key-prefix ccache-ubuntu-${{ matrix.version }}, max-size 200M). compile_macos.yml: - Add Homebrew downloads cache keyed on the macos.sh hash. - Add pip downloads cache keyed on the requirements.txt hash. - Replace the hand-rolled ccache block with setup-ccache / save-ccache (cache-key-prefix ccache-macos-${{ matrix.config }}, max-size 40M to 200M). sitl_tests.yml: - Replace hand-rolled ccache with setup-ccache / save-ccache (cache-key-prefix ccache-sitl-gazebo-classic, max-size 120M to 350M). - Replace the explicit make px4_sitl_default + make sitl_gazebo-classic invocations with the build-gazebo-sitl composite action. - Hoist PX4_CMAKE_BUILD_TYPE and PX4_SBOM_DISABLE to job-level env so they propagate into composite action steps. - save-ccache is added here even though the orchestrator does not save in sitl-tests. The orchestrator relies on an upstream build-sitl-gazebo-classic seeder job; mainline has no such parent, so without save-ccache the cache would never populate. ros_integration_tests.yml: - Replace hand-rolled ccache with setup-ccache / save-ccache (cache-key-prefix ccache-ros-integration, max-size 300M to 400M). - Add a dedicated cache for the Micro-XRCE-DDS Agent build at key xrce-agent-v2.2.1-fastdds-2.8.2-galactic-2021-09-08, gating the build on cache miss. - Add a dedicated cache for the px4-ros2-interface-lib colcon workspace keyed on the hash of msg/*.msg, msg/versioned/*.msg, and srv/*.srv files so it rebuilds only when the interface changes. - Replace the explicit make invocations with the build-gazebo-sitl composite action. - Hoist PX4_SBOM_DISABLE to job-level env for composite propagation. flash_analysis.yml: - Wrap the "current build" (PR head) with a restore/save pair of raw actions/cache/restore@v4 and actions/cache/save@v4 actions, keyed on ref_name + sha with a ref_name fallback. Uses the same ccache configuration (base_dir, compression, compression_level, max_size 200M, hash_dir false, compiler_check content) the composite setup-ccache action uses. - Wrap the "baseline build" (base branch or previous commit) with a second restore/save pair keyed on the baseline sha. ccache -C runs between the two builds to ensure a cold cache for the baseline. - This cannot use the composite actions because the job needs two independent cache lifecycles in a single run; setup-ccache is single-lifecycle. - Fix the markdown indentation in the PR comment body heredoc. The <details> children were indented two spaces, which GitHub markdown parses as an indented code block and renders the collapsible section as literal text. Flushing the children to the left edge of the heredoc makes the <details> render as intended. Signed-off-by: Ramon Roche <mrpollo@gmail.com>
Replace the two-entry matrix (px4_fmu-v5_default, px4_sitl) with a
single job that builds both targets sequentially. This matches the
orchestrator's macos-build job which runs both targets in sequence to
share a single ccache key (ccache-macos, max-size 200M).
The matrix approach ran two parallel jobs that raced on cache save
because both used the same setup-ccache / save-ccache lifecycle. With
the per-matrix suffix workaround (ccache-macos-${{ matrix.config }})
each target had its own cache but could not share compilation units
across targets. The sequential approach lets the second target (the
NuttX cross-compile) benefit from headers and shared objects already
cached by the first (the native SITL build).
Signed-off-by: Ramon Roche <mrpollo@gmail.com>
…ed jobs extras=s3-cache on a RunsOn runner label only takes effect when the runs-on/action step is also present in the job. It hooks into the cache backend and routes actions/cache calls through RunsOn's S3 proxy instead of the default GitHub Actions cache backend. Without the action step the label extra is a no-op and the cache transfers still go through the slower default backend. Six workflows were missing either the extras flag, the action step, or both: - sitl_tests.yml: add runs-on/action@v2 step (label already had extras=s3-cache from PR #27034). - ros_integration_tests.yml: add runs-on/action@v2 step (label already had extras=s3-cache from PR #27034). - itcm_check.yml: add extras=s3-cache to the runner label and add runs-on/action@v2 step. - flash_analysis.yml: add extras=s3-cache to the analyze_flash job label and add runs-on/action@v2 step. The post_pr_comment utility job (1cpu) is left alone to match the orchestrator's rule that utility runners skip s3-cache. - compile_ubuntu.yml: add extras=s3-cache to the runner label and add runs-on/action@v2 step as the first step before the in- container git install. - ros_translation_node.yml: add extras=s3-cache to the runner label and add runs-on/action@v2 step before the setup-ros action. build_all_targets.yml is deliberately left alone. Its matrix builds already have runs-on/action@v2 but no ccache today, so adding extras=s3-cache there would be a no-op until a later PR wires ccache into those builds. Matches the orchestrator's RUNNER_SMALL/RUNNER_MEDIUM/RUNNER_LARGE label definitions on mrpollo/ci_orchestration, which all include extras=s3-cache, and matches the orchestrator convention of adding runs-on/action@v2 as the first step of every RunsOn job. Signed-off-by: Ramon Roche <mrpollo@gmail.com>
Move four GitHub-hosted jobs to RunsOn runners, matching the orchestrator's label assignments on mrpollo/ci_orchestration. - failsafe_sim.yml: ubuntu-latest to 4cpu-linux-x64 with extras=s3-cache (matches orchestrator failsafe-sim on runner_small). - mavros_mission_tests.yml: ubuntu-latest to 4cpu-linux-x64 with extras=s3-cache (matches orchestrator mavros-tests on runner_small). - mavros_offboard_tests.yml: ubuntu-latest to 4cpu-linux-x64 with extras=s3-cache (matches orchestrator mavros-tests on runner_small). - python_checks.yml: ubuntu-24.04 to 1cpu-linux-x64 (matches orchestrator mavsdk-python-checks on runner_utility, which per orchestrator convention does not include extras=s3-cache since utility runners transfer no meaningful data). Every job also gets the runs-on/action@v2 step as the first step of the job, which is required for the extras=s3-cache flag to actually route actions/cache traffic through the RunsOn S3 proxy. python_checks also gets runs-on/action@v2 even without s3-cache to match the orchestrator's convention of putting it on every RunsOn job. The Checks, EKF Change Indicator, EKF Update Change Indicator, Commit Quality, Labeler, CodeQL, and MacOS build workflows are left alone. Checks is a mixed-bag matrix the plan explicitly forbade splitting. EKF indicators and CodeQL have no direct orchestrator equivalent. Commit Quality and Labeler are tiny metadata jobs where migration adds no value. MacOS build still uses macos-latest in the orchestrator too. Signed-off-by: Ramon Roche <mrpollo@gmail.com>
Split the single 11-entry checks build job (now 6 entries after recent trim) into two jobs matched to their cost profile. gate_checks (5 cheap entries) runs on 2cpu-linux-x64 with extras=s3-cache and fail-fast: true. The entries are check_format, check_newlines, validate_module_configs, shellcheck_all, and module_documentation. Each runs in under 2 minutes, none build firmware, none benefit from ccache. fail-fast: true is retained for these since the 99% success rate holds and a failing gate check should cancel the siblings to save runner time. tests (the 1 expensive entry, make tests) runs on 8cpu-linux-x64 with extras=s3-cache as a single non-matrix job. It rebuilds px4_sitl_default and runs the unit test suite, previously 9m41s on free ubuntu-latest at 2 vCPUs. 8cpu was chosen to match the SITL tests runner size landed in PR #27034 since this is the same workload (SITL firmware build). Wired to the setup-ccache / save-ccache composite actions with cache-key-prefix ccache-sitl and max-size 300M, matching the orchestrator basic-tests job. Both jobs use runs-on/action@v2 as the first step, which is required for extras=s3-cache to route actions/cache traffic through the RunsOn S3 proxy. This is a deliberate, limited split of the checks.yml matrix. It does not split every check into its own job; it groups by cost tier (cheap gate checks vs expensive unit tests) so each group can sit on an appropriately sized runner. The orchestrator splits these into gate-checks T1 (runner_utility), shellcheck T1 (runner_utility), and basic-tests T2 (runner_small) across three separate jobs; the two-job approach keeps the surface area smaller. Signed-off-by: Ramon Roche <mrpollo@gmail.com>
Port the orchestrator's ros-translation-node job (L1277-1340 of
ci-orchestrator.yml) to mainline.
- Switch container from rostooling/setup-ros-docker to the official
ros:<distro>-ros-base-<ubuntu> image. Drop the setup-ros@v0.7
action since the official image already has the ROS distribution
installed.
- Wire setup-ccache / save-ccache with cache-key-prefix
ccache-ros-translation-${{ matrix.config.ros_version }}, max-size
150M, base-dir /ros_ws, install-ccache true (the official ROS base
image does not ship ccache).
- Add -DCMAKE_CXX_COMPILER_LAUNCHER=ccache and
-DCMAKE_C_COMPILER_LAUNCHER=ccache to the colcon build invocation
so cmake routes compilations through ccache.
- Split the combined build-and-test step into separate build and test
steps for clearer log output.
Signed-off-by: Ramon Roche <mrpollo@gmail.com>
|
No broken links found in changed files. |
…slation_node Add `permissions: contents: read` at the workflow level in checks.yml and ros_translation_node.yml. CodeQL flagged both for not limiting the GITHUB_TOKEN scope. Signed-off-by: Ramon Roche <mrpollo@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ports cache wiring from the CI orchestrator branch (
mrpollo/ci_orchestration, #26257) into current mainline workflows. Each change matches the corresponding job inci-orchestrator.ymlone to one. This is the big one in the ongoing rollout, touching eight workflows.clang-tidy.ymlgets a single-line bump: ccache max-size goes from 120M to 150M to matchCACHE_CLANG_TIDY.itcm_check.ymlgets wired to thesetup-ccache/save-ccachecomposite actions with per-target cache prefixes (ccache-itcm-${{ matrix.target }}, 200M).failsafe_sim.ymlgets a dedicated cache for the emsdk clone at keyemsdk-4.0.15with the install gated on cache miss. Saves about 30s on every run.compile_ubuntu.ymlsplits the install + build step sosetup-ccachecan run between./Tools/setup/ubuntu.sh(which is what installs ccache in the plainubuntu:22.04/ubuntu:24.04base images) andmake quick_check. Per-matrix cache prefixes keyed on the container version (200M).compile_macos.ymlgets three caches: a Homebrew downloads cache keyed on themacos.shhash, a pip downloads cache keyed on therequirements.txthash, and thesetup-ccache/save-ccachecomposites replacing the hand-rolled ccache block. ccache size goes from 40M to 200M.sitl_tests.ymlreplaces the hand-rolled ccache block withsetup-ccache/save-ccache(prefixccache-sitl-gazebo-classic, 120M to 350M) and swaps the explicitmake px4_sitl_default+make sitl_gazebo-classicinvocations for thebuild-gazebo-sitlcomposite action.PX4_CMAKE_BUILD_TYPEandPX4_SBOM_DISABLEare hoisted to job-level env so they propagate into the composite's child steps. One deviation from the orchestrator:save-ccacheis added here even though orchestrator'ssitl-testsjob does not save. Orchestrator relies on an upstreambuild-sitl-gazebo-classicseeder job, but mainline has no such parent, so withoutsave-ccachethe cache would never populate.ros_integration_tests.ymlgets the biggest structural change. The hand-rolled ccache is replaced with the composites (prefixccache-ros-integration, 300M to 400M). A dedicated cache is added for the Micro-XRCE-DDS Agent build at keyxrce-agent-v2.2.1-fastdds-2.8.2-galactic-2021-09-08with the build gated on cache miss. Another dedicated cache covers thepx4-ros2-interface-libcolcon workspace keyed on the hash ofmsg/*.msg,msg/versioned/*.msg, andsrv/*.srvfiles so it rebuilds only when the uORB interface changes. The explicit PX4 and Gazebo make calls swap to thebuild-gazebo-sitlcomposite.PX4_SBOM_DISABLEis hoisted to job-level env.flash_analysis.ymlgets a two-phase raw ccache wiring: the "current build" (PR head) is wrapped withactions/cache/restore@v4+actions/cache/save@v4keyed onref_name + sha, and the "baseline build" (base branch or previous commit) gets a second restore/save pair keyed on the baseline sha. Accache -Cbetween the two builds ensures the baseline starts cold. This one cannot use the composite actions because the job needs two independent cache lifecycles in a single run. While touching this file, the indentation inside thePR_COMMENT_BODY_EOFheredoc is also fixed: the<details>children were indented two spaces, which GitHub markdown parses as an indented code block and renders the collapsible section as literal text. Flushing them to the heredoc's left edge restores the intended rendering.