ci: orchestration by tiers by mrpollo · Pull Request #26257 · PX4/PX4-Autopilot

mrpollo · 2026-01-12T21:07:35Z

Replaces 14 CI workflows with a single ci-orchestrator.yml that runs jobs in a 4-tier waterfall. Tiers gate each other sequentially: if formatting fails in 2 minutes, nothing else runs. No more burning 60 minutes of AWS compute on a PR that has a style error.

Every job carried over from the old workflows was optimized along the way. Jobs use native container: blocks instead of the old addnab/docker-run-action wrapper, cache scopes were split and tuned (hit rates went from ~48% to 99%+), SITL tests run at 20x speed on 8cpu runners, clang-tidy got a dedicated 16cpu runner and cache, the failsafe sim caches its emsdk, and flash analysis posts sticky PR comments.

Forks can use this without AWS infrastructure. Copy .github/ci-config.yml.example to .github/ci-config.yml to customize runner labels, job toggles, and cache sizes -- all tiers can point to ubuntu-latest (validated here). Alternatively, rename .github/workflows/ci-simple.yml.example to ci-simple.yml for a single-job workflow (SITL + FMU-v5 + tests + format) that finishes in under 15 minutes with no external dependencies.

.github/workflows/ci-orchestrator.yml

github-actions · 2026-01-13T20:46:13Z

No broken links found in changed files.

DronecodeBot · 2026-01-13T22:57:35Z

This pull request has been mentioned on Discussion Forum for PX4, Pixhawk, QGroundControl, MAVSDK, MAVLink. There might be relevant details there:

https://discuss.px4.io/t/px4-dev-call-jan-14-2026-team-sync-and-community-q-a/48289/2

.github/workflows/build_all_targets.yml

.github/workflows/ci-orchestrator.yml

DronecodeBot · 2026-02-11T06:38:33Z

This pull request has been mentioned on Discussion Forum for PX4, Pixhawk, QGroundControl, MAVSDK, MAVLink. There might be relevant details there:

https://discuss.px4.io/t/px4-dev-call-feb-11-2026-team-sync-and-community-q-a/48479/2

github-actions · 2026-02-13T13:26:50Z

🔎 FLASH Analysis

px4_fmu-v5x [Total VM Diff: -8 byte (-0 %)]

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.0%     +55  [ = ]       0    .debug_abbrev
  -0.0%      -2  [ = ]       0    .debug_info
  -0.0%      -5  [ = ]       0    .debug_line
     +40%      +2  [ = ]       0    [Unmapped]
    -0.0%      -7  [ = ]       0    [section .debug_line]
  +0.1%      +8  [ = ]       0    [Unmapped]
  -0.0%      -8  -0.0%      -8    .text
     +44%      +4   +44%      +4    g_nullstring
    -0.0%     -12  -0.0%     -12    [section .text]
  +0.0%     +48  -0.0%      -8    TOTAL

px4_fmu-v6x [Total VM Diff: 0 byte (0 %)]

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.0%     +55  [ = ]       0    .debug_abbrev
  -0.0%      -2  [ = ]       0    .debug_info
  -0.0%      -5  [ = ]       0    .debug_line
    +200%      +2  [ = ]       0    [Unmapped]
    -0.0%      -7  [ = ]       0    [section .debug_line]
  +0.0%     +48  [ = ]       0    TOTAL

Updated: 2026-03-06T05:43:52

MaEtUgR · 2026-02-13T14:16:47Z

I can't help but notice that the total CI time is x2. While atm the longest task takes ~28 minutes the checks on this pr run for ~1 hour. Also it's one file instead of split checks which in my eyes makes it significantly harder to maintain forks that don't have all checks.

How much do we actually safe by running double the time? Is that a must for this design to have all workflows in one file?

DronecodeBot · 2026-02-17T19:42:10Z

This pull request has been mentioned on Discussion Forum for PX4, Pixhawk, QGroundControl, MAVSDK, MAVLink. There might be relevant details there:

https://discuss.px4.io/t/px4-dev-call-feb-18-2026-team-sync-and-community-q-a/48516/2

github-actions

⚠️ Clang-Tidy found issue(s) with the introduced code (1/1)

src/modules/navigator/loiter.cpp

DronecodeBot · 2026-02-25T00:37:02Z

This pull request has been mentioned on Discussion Forum for PX4, Pixhawk, QGroundControl, MAVSDK, MAVLink. There might be relevant details there:

https://discuss.px4.io/t/px4-dev-call-feb-25-2026-team-sync-and-community-q-a/48547/1

Replaces 14 CI workflows with a single ci-orchestrator.yml that runs jobs in a 4-tier waterfall. Tiers gate each other sequentially: if formatting fails in 2 minutes, nothing else runs. Every job carried over from the old workflows was optimized along the way. Jobs use native container: blocks instead of the old addnab/docker-run-action wrapper, cache scopes were split and tuned (hit rates went from ~48% to 99%+), SITL tests run at 20x speed on 8cpu runners, clang-tidy got a dedicated 16cpu runner and cache, the failsafe sim caches its emsdk, and flash analysis posts sticky PR comments. Forks can use this without AWS infrastructure. Copy .github/ci-config.yml.example to .github/ci-config.yml to customize runner labels, job toggles, and cache sizes. Alternatively, rename .github/workflows/ci-simple.yml.example to ci-simple.yml for a single-job workflow that finishes in under 15 minutes on ubuntu-latest. Signed-off-by: Ramon Roche <mrpollo@gmail.com>

Delete the nuttx_env_config workflow. It validated the PX4_EXTRA_NUTTX_CONFIG env var handling in platforms/nuttx/NuttX/CMakeLists.txt by building px4_fmu-v5_default with CONFIG_NSH_LOGIN_PASSWORD injected at configure time. The CI orchestrator rewrite (mrpollo/ci_orchestration, PR #26257) drops this workflow entirely. The cmake feature itself remains; only the CI gate is removed. Signed-off-by: Ramon Roche <mrpollo@gmail.com>

Port the checks.yml and python_checks.yml improvements from the CI orchestrator branch (mrpollo/ci_orchestration, PR #26257) without doing the full T1/T2 split. checks.yml: - Drop 5 matrix entries the orchestrator removed: tests_coverage, px4_fmu-v2_default stack_check, NO_NINJA_BUILD=1 px4_fmu-v5_default, NO_NINJA_BUILD=1 px4_sitl_default, px4_sitl_allyes. - Remove the codecov/codecov-action@v1 step (deprecated, only ran for the dropped tests_coverage entry). - Wire the setup-ccache / save-ccache composite actions around make tests (cache-key-prefix ccache-sitl, max-size 300M) so repeat runs reuse the SITL build tree. Matches the orchestrator basic-tests job 1:1. python_checks.yml: - Replace the apt-get install python3 + pip install --break-system-packages + hardcoded $HOME/.local/bin paths with actions/setup-python@v5 pinned to 3.10 and plain pip install. - Linters now run from PATH instead of $HOME/.local/bin. Stacks on top of mrpollo/ci-checkout-hygiene (#27032) which shipped fail-fast: true, fetch-depth: 1, and the safe.directory step extraction. Signed-off-by: Ramon Roche <mrpollo@gmail.com>

Delete the nuttx_env_config workflow. It validated the PX4_EXTRA_NUTTX_CONFIG env var handling in platforms/nuttx/NuttX/CMakeLists.txt by building px4_fmu-v5_default with CONFIG_NSH_LOGIN_PASSWORD injected at configure time. The CI orchestrator rewrite (mrpollo/ci_orchestration, PR #26257) drops this workflow entirely. The cmake feature itself remains; only the CI gate is removed. Signed-off-by: Ramon Roche <mrpollo@gmail.com>

Port the checks.yml and python_checks.yml improvements from the CI orchestrator branch (mrpollo/ci_orchestration, PR #26257) without doing the full T1/T2 split. checks.yml: - Drop 5 matrix entries the orchestrator removed: tests_coverage, px4_fmu-v2_default stack_check, NO_NINJA_BUILD=1 px4_fmu-v5_default, NO_NINJA_BUILD=1 px4_sitl_default, px4_sitl_allyes. - Remove the codecov/codecov-action@v1 step (deprecated, only ran for the dropped tests_coverage entry). - Wire the setup-ccache / save-ccache composite actions around make tests (cache-key-prefix ccache-sitl, max-size 300M) so repeat runs reuse the SITL build tree. Matches the orchestrator basic-tests job 1:1. python_checks.yml: - Replace the apt-get install python3 + pip install --break-system-packages + hardcoded $HOME/.local/bin paths with actions/setup-python@v5 pinned to 3.10 and plain pip install. - Linters now run from PATH instead of $HOME/.local/bin. Stacks on top of mrpollo/ci-checkout-hygiene (#27032) which shipped fail-fast: true, fetch-depth: 1, and the safe.directory step extraction. Signed-off-by: Ramon Roche <mrpollo@gmail.com>

Port the cache wiring from the CI orchestrator branch (mrpollo/ci_orchestration, PR #26257) into current mainline workflows without merging the orchestrator itself. Each change matches the corresponding job in ci-orchestrator.yml one to one. clang-tidy.yml: - Bump setup-ccache max-size 120M to 150M to match CACHE_CLANG_TIDY. itcm_check.yml: - Wire setup-ccache / save-ccache (cache-key-prefix ccache-itcm-${{ matrix.target }}, max-size 200M). failsafe_sim.yml: - Cache the emsdk clone at key emsdk-4.0.15 and gate the install on cache miss. Saves about 30s per run. compile_ubuntu.yml: - Split the install + build step so setup-ccache can run between ubuntu.sh and make quick_check (ubuntu.sh is what installs ccache in the plain ubuntu:22.04 and ubuntu:24.04 base images). - Wire setup-ccache / save-ccache (cache-key-prefix ccache-ubuntu-${{ matrix.version }}, max-size 200M). compile_macos.yml: - Add Homebrew downloads cache keyed on the macos.sh hash. - Add pip downloads cache keyed on the requirements.txt hash. - Replace the hand-rolled ccache block with setup-ccache / save-ccache (cache-key-prefix ccache-macos-${{ matrix.config }}, max-size 40M to 200M). sitl_tests.yml: - Replace hand-rolled ccache with setup-ccache / save-ccache (cache-key-prefix ccache-sitl-gazebo-classic, max-size 120M to 350M). - Replace the explicit make px4_sitl_default + make sitl_gazebo-classic invocations with the build-gazebo-sitl composite action. - Hoist PX4_CMAKE_BUILD_TYPE and PX4_SBOM_DISABLE to job-level env so they propagate into composite action steps. - save-ccache is added here even though the orchestrator does not save in sitl-tests. The orchestrator relies on an upstream build-sitl-gazebo-classic seeder job; mainline has no such parent, so without save-ccache the cache would never populate. ros_integration_tests.yml: - Replace hand-rolled ccache with setup-ccache / save-ccache (cache-key-prefix ccache-ros-integration, max-size 300M to 400M). - Add a dedicated cache for the Micro-XRCE-DDS Agent build at key xrce-agent-v2.2.1-fastdds-2.8.2-galactic-2021-09-08, gating the build on cache miss. - Add a dedicated cache for the px4-ros2-interface-lib colcon workspace keyed on the hash of msg/*.msg, msg/versioned/*.msg, and srv/*.srv files so it rebuilds only when the interface changes. - Replace the explicit make invocations with the build-gazebo-sitl composite action. - Hoist PX4_SBOM_DISABLE to job-level env for composite propagation. flash_analysis.yml: - Wrap the "current build" (PR head) with a restore/save pair of raw actions/cache/restore@v4 and actions/cache/save@v4 actions, keyed on ref_name + sha with a ref_name fallback. Uses the same ccache configuration (base_dir, compression, compression_level, max_size 200M, hash_dir false, compiler_check content) the composite setup-ccache action uses. - Wrap the "baseline build" (base branch or previous commit) with a second restore/save pair keyed on the baseline sha. ccache -C runs between the two builds to ensure a cold cache for the baseline. - This cannot use the composite actions because the job needs two independent cache lifecycles in a single run; setup-ccache is single-lifecycle. - Fix the markdown indentation in the PR comment body heredoc. The <details> children were indented two spaces, which GitHub markdown parses as an indented code block and renders the collapsible section as literal text. Flushing the children to the left edge of the heredoc makes the <details> render as intended. Signed-off-by: Ramon Roche <mrpollo@gmail.com>

mrpollo · 2026-04-11T16:02:16Z

Closing this in favor of the incremental rollout that ported the orchestrator's improvements to mainline without merging the tiered architecture. The individual enhancements have been shipped as targeted PRs:

Composite actions and infrastructure:

ci(actions): add composite actions and clang-tidy PR helper #27005 — composite actions (setup-ccache, save-ccache, build-gazebo-sitl)
ci(workflows): bump all action versions to latest majors #27039 — bump all action versions to latest majors (Node.js 20 deprecation)

Checkout and workflow hygiene:

ci(workflows): shallow checkout and fail-fast in checks #27032 — shallow checkout and fail-fast in checks
ci(workflows): remove nuttx_env_config #27033 — remove nuttx_env_config (dropped by orchestrator)
ci(checks): trim matrix, ccache tests, modernize python_checks #27035 — trim checks matrix, wire ccache, modernize python_checks

Runner upgrades and migrations:

ci(workflows): upgrade SITL and ROS integration runners to 8cpu #27034 — upgrade SITL and ROS integration runners to 8cpu
ci(workflows): wire ccache and caches across SITL, ROS, macOS, Ubuntu #27036 — wire ccache and s3-cache across all workflows, migrate ubuntu-latest jobs to RunsOn, split checks into gate_checks + tests, merge macOS matrix into sequential job

MAVROS and ROS:

ci(mavros): merge mission+offboard into one workflow, migrate to noetic and Python 3 #27038 — merge MAVROS mission+offboard into one workflow, migrate to noetic and Python 3
ROS translation node switched to official ros: images with ccache

Fuzzing:

ci(fuzzing): migrate to RunsOn with ccache and bump container #27048 — migrate fuzzing to RunsOn with ccache

EKF consolidation:

ci(checks): merge EKF change indicators into tests job #27047 — merge EKF change indicators into checks:tests

build_all_targets overhaul:

ci(build-all): MCU-based grouping, cache seeders, and build infrastructure overhaul #27050 — MCU-based board grouping, cache seeders, per-chip cache sizes, externalized config, container bumps, 4cpu matrix / 8cpu seeders

compile_ubuntu trim:

ci(compile-ubuntu): replace quick_check with targeted builds #27058 — replace quick_check with targeted SITL + NuttX builds

The tiered gating architecture (the orchestrator's defining feature) is deferred. The current standalone workflows with the improvements above achieve the same cache hit rates and build times without forcing contributors into a new workflow structure.

github-actions bot added the Documentation 📑 label Jan 12, 2026

github-advanced-security bot found potential problems Jan 12, 2026

View reviewed changes

mrpollo marked this pull request as ready for review January 16, 2026 01:04

mrpollo force-pushed the mrpollo/ci_orchestration branch from d135be1 to a6066a4 Compare January 16, 2026 01:04

mrpollo changed the title ~~ci: Orchestration by Tiers~~ ci: orchestration by tiers Jan 16, 2026

farhangnaderi reviewed Jan 16, 2026

View reviewed changes

haumarco self-requested a review January 16, 2026 13:25

mrpollo mentioned this pull request Jan 28, 2026

CI: fix clang-tidy #26367

Merged

mrpollo force-pushed the mrpollo/ci_orchestration branch 2 times, most recently from 79c4591 to eb3f07f Compare February 13, 2026 06:42

mrpollo mentioned this pull request Feb 13, 2026

CI: replace all usage of addnab/docker-run-action #26478

Merged

MaEtUgR force-pushed the mrpollo/ci_orchestration branch from eb3f07f to 865b546 Compare February 13, 2026 12:54

mrpollo force-pushed the mrpollo/ci_orchestration branch 4 times, most recently from a06ad9d to f4594e0 Compare February 17, 2026 03:27

mrpollo mentioned this pull request Feb 17, 2026

CI: disable VTOL and tailsitter SITL tests #26510

Merged

mrpollo force-pushed the mrpollo/ci_orchestration branch from e4fb17f to f263309 Compare February 17, 2026 17:22

mrpollo force-pushed the mrpollo/ci_orchestration branch 3 times, most recently from 0b0813e to c671cce Compare February 19, 2026 04:11

github-actions bot reviewed Feb 19, 2026

View reviewed changes

src/modules/navigator/loiter.cpp Outdated Show resolved Hide resolved

mrpollo force-pushed the mrpollo/ci_orchestration branch from c671cce to 078796b Compare February 19, 2026 15:28

mrpollo force-pushed the mrpollo/ci_orchestration branch 6 times, most recently from 56184f2 to 008c154 Compare February 21, 2026 04:54

mrpollo force-pushed the mrpollo/ci_orchestration branch 4 times, most recently from fa50863 to 260a1d9 Compare March 6, 2026 05:24

mrpollo force-pushed the mrpollo/ci_orchestration branch from 260a1d9 to 885af94 Compare March 6, 2026 05:28

This was referenced Apr 9, 2026

ci(workflows): remove nuttx_env_config #27033

Merged

ci(workflows): upgrade SITL and ROS integration runners to 8cpu #27034

Merged

mrpollo mentioned this pull request Apr 9, 2026

ci(checks): trim matrix, ccache tests, modernize python_checks #27035

Merged

This was referenced Apr 9, 2026

ci(workflows): wire ccache and caches across SITL, ROS, macOS, Ubuntu #27036

Merged

ci(mavros): merge mission+offboard into one workflow, migrate to noetic and Python 3 #27038

Merged

mrpollo closed this Apr 11, 2026

Conversation

mrpollo commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DronecodeBot commented Jan 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DronecodeBot commented Feb 11, 2026

Uh oh!

github-actions bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔎 FLASH Analysis

Uh oh!

MaEtUgR commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DronecodeBot commented Feb 17, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DronecodeBot commented Feb 25, 2026

Uh oh!

mrpollo commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mrpollo commented Jan 12, 2026 •

edited

Loading

github-actions bot commented Jan 13, 2026 •

edited

Loading

github-actions bot commented Feb 13, 2026 •

edited

Loading

MaEtUgR commented Feb 13, 2026 •

edited

Loading