Skip to content

ci: orchestration by tiers#26257

Closed
mrpollo wants to merge 1 commit intomainfrom
mrpollo/ci_orchestration
Closed

ci: orchestration by tiers#26257
mrpollo wants to merge 1 commit intomainfrom
mrpollo/ci_orchestration

Conversation

@mrpollo
Copy link
Copy Markdown
Contributor

@mrpollo mrpollo commented Jan 12, 2026

Replaces 14 CI workflows with a single ci-orchestrator.yml that runs jobs in a 4-tier waterfall. Tiers gate each other sequentially: if formatting fails in 2 minutes, nothing else runs. No more burning 60 minutes of AWS compute on a PR that has a style error.

Every job carried over from the old workflows was optimized along the way. Jobs use native container: blocks instead of the old addnab/docker-run-action wrapper, cache scopes were split and tuned (hit rates went from ~48% to 99%+), SITL tests run at 20x speed on 8cpu runners, clang-tidy got a dedicated 16cpu runner and cache, the failsafe sim caches its emsdk, and flash analysis posts sticky PR comments.

Forks can use this without AWS infrastructure. Copy .github/ci-config.yml.example to .github/ci-config.yml to customize runner labels, job toggles, and cache sizes -- all tiers can point to ubuntu-latest (validated here). Alternatively, rename .github/workflows/ci-simple.yml.example to ci-simple.yml for a single-job workflow (SITL + FMU-v5 + tests + format) that finishes in under 15 minutes with no external dependencies.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 13, 2026

No broken links found in changed files.

@DronecodeBot
Copy link
Copy Markdown

This pull request has been mentioned on Discussion Forum for PX4, Pixhawk, QGroundControl, MAVSDK, MAVLink. There might be relevant details there:

https://discuss.px4.io/t/px4-dev-call-jan-14-2026-team-sync-and-community-q-a/48289/2

@mrpollo mrpollo marked this pull request as ready for review January 16, 2026 01:04
@mrpollo mrpollo force-pushed the mrpollo/ci_orchestration branch from d135be1 to a6066a4 Compare January 16, 2026 01:04
@mrpollo mrpollo changed the title ci: Orchestration by Tiers ci: orchestration by tiers Jan 16, 2026
@haumarco haumarco self-requested a review January 16, 2026 13:25
@mrpollo mrpollo mentioned this pull request Jan 28, 2026
@DronecodeBot
Copy link
Copy Markdown

This pull request has been mentioned on Discussion Forum for PX4, Pixhawk, QGroundControl, MAVSDK, MAVLink. There might be relevant details there:

https://discuss.px4.io/t/px4-dev-call-feb-11-2026-team-sync-and-community-q-a/48479/2

@mrpollo mrpollo force-pushed the mrpollo/ci_orchestration branch 2 times, most recently from 79c4591 to eb3f07f Compare February 13, 2026 06:42
@MaEtUgR MaEtUgR force-pushed the mrpollo/ci_orchestration branch from eb3f07f to 865b546 Compare February 13, 2026 12:54
@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 13, 2026

🔎 FLASH Analysis

px4_fmu-v5x [Total VM Diff: -8 byte (-0 %)]
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.0%     +55  [ = ]       0    .debug_abbrev
  -0.0%      -2  [ = ]       0    .debug_info
  -0.0%      -5  [ = ]       0    .debug_line
     +40%      +2  [ = ]       0    [Unmapped]
    -0.0%      -7  [ = ]       0    [section .debug_line]
  +0.1%      +8  [ = ]       0    [Unmapped]
  -0.0%      -8  -0.0%      -8    .text
     +44%      +4   +44%      +4    g_nullstring
    -0.0%     -12  -0.0%     -12    [section .text]
  +0.0%     +48  -0.0%      -8    TOTAL

px4_fmu-v6x [Total VM Diff: 0 byte (0 %)]
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.0%     +55  [ = ]       0    .debug_abbrev
  -0.0%      -2  [ = ]       0    .debug_info
  -0.0%      -5  [ = ]       0    .debug_line
    +200%      +2  [ = ]       0    [Unmapped]
    -0.0%      -7  [ = ]       0    [section .debug_line]
  +0.0%     +48  [ = ]       0    TOTAL

Updated: 2026-03-06T05:43:52

@MaEtUgR
Copy link
Copy Markdown
Member

MaEtUgR commented Feb 13, 2026

I can't help but notice that the total CI time is x2. While atm the longest task takes ~28 minutes the checks on this pr run for ~1 hour. Also it's one file instead of split checks which in my eyes makes it significantly harder to maintain forks that don't have all checks.

How much do we actually safe by running double the time? Is that a must for this design to have all workflows in one file?

@mrpollo mrpollo force-pushed the mrpollo/ci_orchestration branch 4 times, most recently from a06ad9d to f4594e0 Compare February 17, 2026 03:27
@mrpollo mrpollo force-pushed the mrpollo/ci_orchestration branch from e4fb17f to f263309 Compare February 17, 2026 17:22
@DronecodeBot
Copy link
Copy Markdown

This pull request has been mentioned on Discussion Forum for PX4, Pixhawk, QGroundControl, MAVSDK, MAVLink. There might be relevant details there:

https://discuss.px4.io/t/px4-dev-call-feb-18-2026-team-sync-and-community-q-a/48516/2

@mrpollo mrpollo force-pushed the mrpollo/ci_orchestration branch 3 times, most recently from 0b0813e to c671cce Compare February 19, 2026 04:11
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Clang-Tidy found issue(s) with the introduced code (1/1)

@mrpollo mrpollo force-pushed the mrpollo/ci_orchestration branch from c671cce to 078796b Compare February 19, 2026 15:28
@mrpollo mrpollo force-pushed the mrpollo/ci_orchestration branch 6 times, most recently from 56184f2 to 008c154 Compare February 21, 2026 04:54
@DronecodeBot
Copy link
Copy Markdown

This pull request has been mentioned on Discussion Forum for PX4, Pixhawk, QGroundControl, MAVSDK, MAVLink. There might be relevant details there:

https://discuss.px4.io/t/px4-dev-call-feb-25-2026-team-sync-and-community-q-a/48547/1

@mrpollo mrpollo force-pushed the mrpollo/ci_orchestration branch 4 times, most recently from fa50863 to 260a1d9 Compare March 6, 2026 05:24
Replaces 14 CI workflows with a single ci-orchestrator.yml that runs
jobs in a 4-tier waterfall. Tiers gate each other sequentially: if
formatting fails in 2 minutes, nothing else runs.

Every job carried over from the old workflows was optimized along the
way. Jobs use native container: blocks instead of the old
addnab/docker-run-action wrapper, cache scopes were split and tuned
(hit rates went from ~48% to 99%+), SITL tests run at 20x speed on
8cpu runners, clang-tidy got a dedicated 16cpu runner and cache, the
failsafe sim caches its emsdk, and flash analysis posts sticky PR
comments.

Forks can use this without AWS infrastructure. Copy
.github/ci-config.yml.example to .github/ci-config.yml to customize
runner labels, job toggles, and cache sizes. Alternatively, rename
.github/workflows/ci-simple.yml.example to ci-simple.yml for a
single-job workflow that finishes in under 15 minutes on ubuntu-latest.

Signed-off-by: Ramon Roche <mrpollo@gmail.com>
@mrpollo mrpollo force-pushed the mrpollo/ci_orchestration branch from 260a1d9 to 885af94 Compare March 6, 2026 05:28
mrpollo added a commit that referenced this pull request Apr 9, 2026
Delete the nuttx_env_config workflow. It validated the
PX4_EXTRA_NUTTX_CONFIG env var handling in
platforms/nuttx/NuttX/CMakeLists.txt by building px4_fmu-v5_default
with CONFIG_NSH_LOGIN_PASSWORD injected at configure time.

The CI orchestrator rewrite (mrpollo/ci_orchestration, PR #26257) drops
this workflow entirely. The cmake feature itself remains; only the CI
gate is removed.

Signed-off-by: Ramon Roche <mrpollo@gmail.com>
mrpollo added a commit that referenced this pull request Apr 9, 2026
Port the checks.yml and python_checks.yml improvements from the CI
orchestrator branch (mrpollo/ci_orchestration, PR #26257) without
doing the full T1/T2 split.

checks.yml:
- Drop 5 matrix entries the orchestrator removed:
  tests_coverage, px4_fmu-v2_default stack_check,
  NO_NINJA_BUILD=1 px4_fmu-v5_default,
  NO_NINJA_BUILD=1 px4_sitl_default, px4_sitl_allyes.
- Remove the codecov/codecov-action@v1 step (deprecated, only ran
  for the dropped tests_coverage entry).
- Wire the setup-ccache / save-ccache composite actions around
  make tests (cache-key-prefix ccache-sitl, max-size 300M) so
  repeat runs reuse the SITL build tree. Matches the orchestrator
  basic-tests job 1:1.

python_checks.yml:
- Replace the apt-get install python3 + pip install
  --break-system-packages + hardcoded $HOME/.local/bin paths with
  actions/setup-python@v5 pinned to 3.10 and plain pip install.
- Linters now run from PATH instead of $HOME/.local/bin.

Stacks on top of mrpollo/ci-checkout-hygiene (#27032) which shipped
fail-fast: true, fetch-depth: 1, and the safe.directory step
extraction.

Signed-off-by: Ramon Roche <mrpollo@gmail.com>
mrpollo added a commit that referenced this pull request Apr 9, 2026
Delete the nuttx_env_config workflow. It validated the
PX4_EXTRA_NUTTX_CONFIG env var handling in
platforms/nuttx/NuttX/CMakeLists.txt by building px4_fmu-v5_default
with CONFIG_NSH_LOGIN_PASSWORD injected at configure time.

The CI orchestrator rewrite (mrpollo/ci_orchestration, PR #26257) drops
this workflow entirely. The cmake feature itself remains; only the CI
gate is removed.

Signed-off-by: Ramon Roche <mrpollo@gmail.com>
mrpollo added a commit that referenced this pull request Apr 9, 2026
Port the checks.yml and python_checks.yml improvements from the CI
orchestrator branch (mrpollo/ci_orchestration, PR #26257) without
doing the full T1/T2 split.

checks.yml:
- Drop 5 matrix entries the orchestrator removed:
  tests_coverage, px4_fmu-v2_default stack_check,
  NO_NINJA_BUILD=1 px4_fmu-v5_default,
  NO_NINJA_BUILD=1 px4_sitl_default, px4_sitl_allyes.
- Remove the codecov/codecov-action@v1 step (deprecated, only ran
  for the dropped tests_coverage entry).
- Wire the setup-ccache / save-ccache composite actions around
  make tests (cache-key-prefix ccache-sitl, max-size 300M) so
  repeat runs reuse the SITL build tree. Matches the orchestrator
  basic-tests job 1:1.

python_checks.yml:
- Replace the apt-get install python3 + pip install
  --break-system-packages + hardcoded $HOME/.local/bin paths with
  actions/setup-python@v5 pinned to 3.10 and plain pip install.
- Linters now run from PATH instead of $HOME/.local/bin.

Stacks on top of mrpollo/ci-checkout-hygiene (#27032) which shipped
fail-fast: true, fetch-depth: 1, and the safe.directory step
extraction.

Signed-off-by: Ramon Roche <mrpollo@gmail.com>
mrpollo added a commit that referenced this pull request Apr 9, 2026
Port the checks.yml and python_checks.yml improvements from the CI
orchestrator branch (mrpollo/ci_orchestration, PR #26257) without
doing the full T1/T2 split.

checks.yml:
- Drop 5 matrix entries the orchestrator removed:
  tests_coverage, px4_fmu-v2_default stack_check,
  NO_NINJA_BUILD=1 px4_fmu-v5_default,
  NO_NINJA_BUILD=1 px4_sitl_default, px4_sitl_allyes.
- Remove the codecov/codecov-action@v1 step (deprecated, only ran
  for the dropped tests_coverage entry).
- Wire the setup-ccache / save-ccache composite actions around
  make tests (cache-key-prefix ccache-sitl, max-size 300M) so
  repeat runs reuse the SITL build tree. Matches the orchestrator
  basic-tests job 1:1.

python_checks.yml:
- Replace the apt-get install python3 + pip install
  --break-system-packages + hardcoded $HOME/.local/bin paths with
  actions/setup-python@v5 pinned to 3.10 and plain pip install.
- Linters now run from PATH instead of $HOME/.local/bin.

Stacks on top of mrpollo/ci-checkout-hygiene (#27032) which shipped
fail-fast: true, fetch-depth: 1, and the safe.directory step
extraction.

Signed-off-by: Ramon Roche <mrpollo@gmail.com>
mrpollo added a commit that referenced this pull request Apr 9, 2026
Port the cache wiring from the CI orchestrator branch
(mrpollo/ci_orchestration, PR #26257) into current mainline workflows
without merging the orchestrator itself. Each change matches the
corresponding job in ci-orchestrator.yml one to one.

clang-tidy.yml:
- Bump setup-ccache max-size 120M to 150M to match CACHE_CLANG_TIDY.

itcm_check.yml:
- Wire setup-ccache / save-ccache (cache-key-prefix
  ccache-itcm-${{ matrix.target }}, max-size 200M).

failsafe_sim.yml:
- Cache the emsdk clone at key emsdk-4.0.15 and gate the install on
  cache miss. Saves about 30s per run.

compile_ubuntu.yml:
- Split the install + build step so setup-ccache can run between
  ubuntu.sh and make quick_check (ubuntu.sh is what installs ccache
  in the plain ubuntu:22.04 and ubuntu:24.04 base images).
- Wire setup-ccache / save-ccache (cache-key-prefix
  ccache-ubuntu-${{ matrix.version }}, max-size 200M).

compile_macos.yml:
- Add Homebrew downloads cache keyed on the macos.sh hash.
- Add pip downloads cache keyed on the requirements.txt hash.
- Replace the hand-rolled ccache block with setup-ccache /
  save-ccache (cache-key-prefix ccache-macos-${{ matrix.config }},
  max-size 40M to 200M).

sitl_tests.yml:
- Replace hand-rolled ccache with setup-ccache / save-ccache
  (cache-key-prefix ccache-sitl-gazebo-classic, max-size 120M to
  350M).
- Replace the explicit make px4_sitl_default + make sitl_gazebo-classic
  invocations with the build-gazebo-sitl composite action.
- Hoist PX4_CMAKE_BUILD_TYPE and PX4_SBOM_DISABLE to job-level env
  so they propagate into composite action steps.
- save-ccache is added here even though the orchestrator does not
  save in sitl-tests. The orchestrator relies on an upstream
  build-sitl-gazebo-classic seeder job; mainline has no such parent,
  so without save-ccache the cache would never populate.

ros_integration_tests.yml:
- Replace hand-rolled ccache with setup-ccache / save-ccache
  (cache-key-prefix ccache-ros-integration, max-size 300M to 400M).
- Add a dedicated cache for the Micro-XRCE-DDS Agent build at key
  xrce-agent-v2.2.1-fastdds-2.8.2-galactic-2021-09-08, gating the
  build on cache miss.
- Add a dedicated cache for the px4-ros2-interface-lib colcon
  workspace keyed on the hash of msg/*.msg, msg/versioned/*.msg, and
  srv/*.srv files so it rebuilds only when the interface changes.
- Replace the explicit make invocations with the build-gazebo-sitl
  composite action.
- Hoist PX4_SBOM_DISABLE to job-level env for composite propagation.

flash_analysis.yml:
- Wrap the "current build" (PR head) with a restore/save pair of raw
  actions/cache/restore@v4 and actions/cache/save@v4 actions, keyed
  on ref_name + sha with a ref_name fallback. Uses the same ccache
  configuration (base_dir, compression, compression_level, max_size
  200M, hash_dir false, compiler_check content) the composite
  setup-ccache action uses.
- Wrap the "baseline build" (base branch or previous commit) with a
  second restore/save pair keyed on the baseline sha. ccache -C runs
  between the two builds to ensure a cold cache for the baseline.
- This cannot use the composite actions because the job needs two
  independent cache lifecycles in a single run; setup-ccache is
  single-lifecycle.
- Fix the markdown indentation in the PR comment body heredoc. The
  <details> children were indented two spaces, which GitHub markdown
  parses as an indented code block and renders the collapsible
  section as literal text. Flushing the children to the left edge of
  the heredoc makes the <details> render as intended.

Signed-off-by: Ramon Roche <mrpollo@gmail.com>
@mrpollo
Copy link
Copy Markdown
Contributor Author

mrpollo commented Apr 11, 2026

Closing this in favor of the incremental rollout that ported the orchestrator's improvements to mainline without merging the tiered architecture. The individual enhancements have been shipped as targeted PRs:

Composite actions and infrastructure:

Checkout and workflow hygiene:

Runner upgrades and migrations:

MAVROS and ROS:

Fuzzing:

EKF consolidation:

build_all_targets overhaul:

compile_ubuntu trim:

The tiered gating architecture (the orchestrator's defining feature) is deferred. The current standalone workflows with the improvements above achieve the same cache hit rates and build times without forcing contributors into a new workflow structure.

@mrpollo mrpollo closed this Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants