Skip to content

ci(health-check): run on push to main + split into pr / main / reusable#7030

Merged
xmfcx merged 2 commits intomainfrom
feat/split-03-healthcheck-pr-split
Apr 17, 2026
Merged

ci(health-check): run on push to main + split into pr / main / reusable#7030
xmfcx merged 2 commits intomainfrom
feat/split-03-healthcheck-pr-split

Conversation

@xmfcx
Copy link
Copy Markdown
Contributor

@xmfcx xmfcx commented Apr 17, 2026

  • Parent Issue: Simplify the docker images and workflows #6852
  • Add a push: main trigger to health-check.yaml so merges populate the :health-check-*-main registry cache — previously it only ran on PR/schedule/dispatch, starving subsequent PRs of warm layers.
  • Split the three trigger modes out of the monolithic workflow:
    • health-check-pr.yaml (new): pull_request with paths: at the trigger level + require-label.yaml@v1 gate, calls the reusable on success. paths: replaces the per-step changed-files conditional; the hard label gate turns a watched-path PR without run:health-check red instead of silently skipping.
    • health-check-reusable.yaml (new): workflow_call with the matrix docker-build job; step-level if:s reduced to orthogonal matrix filters (build-type != 'main', platform == 'arm64', build-type == 'nightly').
    • health-check.yaml (trimmed): push: main, schedule, workflow_dispatch, calls the reusable.
  • Drop the scenario-test downstream job from the PR path along with the docker save + upload-artifact steps that fed it and load: true on the bake step (only needed for the save). scenario-test-reusable.yaml is deleted in the same PR since nothing else referenced it. Merge-time scenario coverage is unaffected: the standalone scenario-test.yaml (cron + push: main + workflow_dispatch) still runs and does not use the reusable.
  • ~10 repeated if: conditionals gone; the changed-files step gone; each trigger mode expressed in its own file.

Why

The previous single-file workflow mixed pull_request / schedule / workflow_dispatch behind identical per-step guards. Untangling them into trigger-specific files removes the conditional thicket, makes the label gate hard-fail (red) instead of silent-skip, and lets push: main runs keep warming the shared BuildKit registry cache that PR builds read as their fallback. Scenario-test is dropped from the PR path (and its now-orphaned reusable deleted) so that restructuring doesn't have to carry the artifact save/upload round-trip; the standalone merge-time scenario workflow still exercises the same coverage, and a PR-gated form will be brought back in a follow-up once its shape is reworked.


Test plan

  • Open a PR touching docker-new/** without the run:health-check label: expect health-check-pr / require-label red and the reusable job skipped.
  • Add the label to that PR: expect require-label green, the reusable workflow (docker-build matrix) running; confirm no scenario-test job appears in the run.
  • Push this branch to main: expect health-check / health-check to run with no label gate, populating ghcr.io/${{ github.repository }}-buildcache-new:*-main tags, and scenario-test to run independently as before.
  • Dispatch health-check via the Actions UI (workflow_dispatch): expect the reusable job to run without a label.
  • grep -r scenario-test-reusable .github/ returns nothing.

Disentangle the three-mode gating in a single workflow file by
splitting it into dedicated workflows per trigger plus a shared
reusable job, and add a `push: main` trigger so the shared registry
cache gets populated on merges.

Previously `health-check.yaml` mixed `pull_request`, `schedule`, and
`workflow_dispatch` behind ~10 repeated per-step conditionals
(`steps.changed-files.outputs.any_changed == 'true' ||
github.event_name == 'schedule' ||
github.event_name == 'workflow_dispatch'`). The conditional thicket
made the file hard to read and silently skipped label-gated PRs when
`run:health-check` was absent.

Also: the docker-bake step reads from `:health-check-*-main` as a
fallback cache ref, but that ref was only written on PR/schedule/
dispatch runs — never on `push: main` — so merges never warmed the
cache for subsequent PRs.

- `health-check-reusable.yaml` (new): `workflow_call` with the matrix
  docker build job. Step-level `if:`s reduced to the genuinely
  orthogonal matrix filters (`build-type != 'main'`,
  `platform == 'arm64'`, `build-type == 'nightly'`). No label or
  paths logic.

- `health-check-pr.yaml` (new): `pull_request` trigger with `paths:`
  filter at the trigger level (replaces the per-step `changed-files`
  gate). Uses `require-label.yaml@v1` which exits 1 on missing label,
  so a watched-path PR without `run:health-check` turns red instead
  of silently skipping. Calls the reusable on success.

- `health-check.yaml`: trimmed to `push: main`, `schedule`, and
  `workflow_dispatch`. No label check, no paths filter; directly
  calls the reusable. Purpose is to keep populating the shared
  `:health-check-*-main` BuildKit registry cache for PRs to read.

Net: ~10 repeated `if:` conditionals removed, the `changed-files`
step removed, each trigger mode expressed in its own file, and the
main cache gets written on every merge.

Signed-off-by: Mete Fatih Cırıt <[email protected]>
@xmfcx xmfcx self-assigned this Apr 17, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 17, 2026

Thank you for contributing to the Autoware project!

🚧 If your pull request is in progress, switch it to draft mode.

Please ensure:

@xmfcx xmfcx added the run:health-check Run health-check label Apr 17, 2026
@xmfcx xmfcx requested a review from mitsudome-r April 17, 2026 17:14
@xmfcx
Copy link
Copy Markdown
Contributor Author

xmfcx commented Apr 17, 2026

image

@mitsudome-r I also updated the required workflow accordingly 👍

The previous commit dropped the `scenario-test` job from
`health-check.yaml`, which was the sole caller of
`scenario-test-reusable.yaml`. Nothing else in the repo references it
(grep confirms only the `name:` line remains). Delete the file so the
workflows directory reflects the live pipeline.

Signed-off-by: Mete Fatih Cırıt <[email protected]>
Copy link
Copy Markdown
Member

@mitsudome-r mitsudome-r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can create a follow up PR to put back the scenario test workflow after merging.

@xmfcx
Copy link
Copy Markdown
Contributor Author

xmfcx commented Apr 17, 2026

What scenario-test.yaml does

Scheduled build-and-run of a scenario test against Autoware, independent of the docker-new pipeline. Triggers:

  • schedule: cron 0 12 * * * (daily 12:00 UTC)
  • push: branches: [main]
  • workflow_dispatch

Runs on GHA-hosted ubuntu-24.04, inside the ghcr.io/autowarefoundation/autoware:universe-devel container. Steps:

  1. apt-get install build deps; pip install --upgrade gdown vcs2l.
  2. Fresh clone: git clone https://github.com/autowarefoundation/autoware.git ~/autoware_ws — ignores the checked-out ref, always tracks main tip.
  3. vcs import src < repositories/simulator.repos to pull simulator packages.
  4. rosdep install --from-paths src --ignore-src -r -y.
  5. pip3 install --user xmlschema==3.4.5 (hard-pinned compat workaround).
  6. colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=Release --executor sequential with CMAKE_DISABLE_PRECOMPILE_HEADERS=ON (memory).
  7. gdown --id <FILE_ID> for a sample scenario yaml and sample map zip.
  8. sed-rewrite /home/user/$HOME/ in the scenario yaml.
  9. ros2 launch scenario_test_runner scenario_test_runner.launch.py ... with RMW_IMPLEMENTATION=rmw_cyclonedds_cpp, 300s initialize_duration, no rviz, 1 fps.
  10. Parse ./results/scenario_test_runner/result.junit.xml — fail if <failure or <error tags present.
  11. Upload ./results/, scenario_output.log, ~/.ros/log/.
  12. pkill -9 ros2 / scenario_test_runner / openscenario_interpreter_node / python.

Actual health

Last 15 runs: every single one red. Latest failures (incl. the 2026-04-17 cron run at 12:00 UTC) all die at the same step:

gdown: error: unrecognized arguments: --id

Newer gdown versions (≥5.x) dropped --id; the file ID is now a positional arg. Since the workflow unconditionally pip install --upgrade gdown, the build has been broken since whenever the gdown release removed --id, and nobody has fixed it. The scenario never even launches — everything from step 9 onward is dead code in current CI.

Other latent issues

  • Step 2 ignores the checked-out repo. Every run is against main tip via a fresh git clone, which means the workflow cannot meaningfully be tested from a feature branch — workflow_dispatch on a branch still builds main.
  • xmlschema pin at 3.4.5 is a fragile compat hack with no comment explaining why.
  • No job-level timeout-minutes — a hanging scenario_test_runner would eat the default 360 min.
  • --executor sequential build is very slow; every run recompiles the whole universe tree from scratch (no cache).
  • actions/checkout@v4 and actions/upload-artifact@v4 flagged for Node 20 deprecation (removal Sep 2026).
  • pip install --user inside a container works here but is non-hermetic.
  • No concurrency cancel for cron-triggered runs beyond the default group (group includes event_name + ref, so overlapping schedule + push-to-main can both run).

Bottom line

The workflow is architecturally a separate merge-time scenario gate from the monolithic health-check.yaml. It has been silently failing on gdown --id since the library drop, so the merge-time "scenario coverage" that still nominally exists is in fact not producing signal — red runs just get ignored. If this is to be reinstated as a real gate, at minimum: fix the gdown call, pin gdown to a working major version (or migrate to a non-Drive host), add timeout-minutes, and decide whether the workflow should test the ref under test or always main tip.

@xmfcx
Copy link
Copy Markdown
Contributor Author

xmfcx commented Apr 17, 2026

@mitsudome-r also it will work faster with the new images because cyclonedds and other dds optimizations are done properly in the new images 🐇 as long as you run through the /entrypoint.sh

@xmfcx xmfcx merged commit 6969ed2 into main Apr 17, 2026
21 checks passed
@xmfcx xmfcx deleted the feat/split-03-healthcheck-pr-split branch April 17, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run:health-check Run health-check

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants