ci(docker-new): split base-cuda layer and restructure CI pipelines#7033
Merged
ci(docker-new): split base-cuda layer and restructure CI pipelines#7033
Conversation
Restructures the docker-new image graph and CI topology: - Add base-cuda-runtime / base-cuda-devel stages as a dedicated CUDA base, and rebase universe-cuda off them instead of universe. - Drop universe-runtime-dependencies; fold into universe. - Split rmw and nvidia ansible roles into standalone playbooks so Dockerfiles bind-mount only what they need. - Switch every BuildKit RUN mount to named IDs (apt/ccache/pip/pipx, ROS_DISTRO-scoped where relevant). - Split per-distro CI into per-arch jobs; extract multi-arch manifest stitching into docker-manifest-new.yaml. - Persist BuildKit mount caches across runs via actions/cache + buildkit-cache-dance, with a size guard and lineage pruning. Signed-off-by: Mete Fatih Cırıt <[email protected]>
Contributor
Author
Docker layer graph updateBeforegraph TD
base(["base"])
base --> core-dependencies(["core-dependencies"])
core-dependencies --> core-devel(["core-devel"])
core-devel --> universe-dependencies(["universe-dependencies"])
universe-dependencies --> universe-dependencies-cuda(["universe-dependencies-cuda"])
universe-dependencies --> universe-devel(["universe-devel"])
universe-dependencies-cuda --> universe-devel-cuda(["universe-devel-cuda"])
base --> core(["core"])
core-devel -- " COPY /opt/autoware " --> core
core --> universe-runtime-dependencies(["universe-runtime-dependencies"])
universe-runtime-dependencies --> universe(["universe"])
universe-runtime-dependencies --> universe-cuda(["universe-cuda"])
universe-devel -- " COPY /opt/autoware " --> universe
universe-devel-cuda -- " COPY /opt/autoware " --> universe-cuda
classDef base fill: #e8e8e8, color: #333
classDef devel fill: #bbdefb, color: #333
classDef runtime fill: #c8e6c9, color: #333
classDef cuda fill: #e1bee7, color: #333
class base base
class core-dependencies,core-devel,universe-dependencies,universe-devel devel
class core,universe-runtime-dependencies,universe runtime
class universe-dependencies-cuda,universe-devel-cuda,universe-cuda cuda
Aftergraph TB
base(["base"]) --> core-dependencies(["core-dependencies"]) & core(["core"]) & base-cuda-runtime(["base-cuda-runtime"])
core-dependencies --> core-devel(["core-devel"])
core-devel --> universe-dependencies(["universe-dependencies"])
universe-dependencies --> universe-devel(["universe-devel"])
universe-dependencies-cuda(["universe-dependencies-cuda"]) --> universe-devel-cuda(["universe-devel-cuda"])
core-devel -- " COPY /opt/autoware " --> core
core --> universe(["universe"])
universe-devel -- " COPY /opt/autoware " --> universe
universe-devel-cuda -- " COPY /opt/autoware " --> universe-cuda(["universe-cuda"])
core-devel -- " COPY /opt/autoware " --> universe-dependencies-cuda
base-cuda-devel(["base-cuda-devel"]) --> universe-dependencies-cuda
base-cuda-runtime --> universe-cuda & base-cuda-devel
classDef base fill: #e8e8e8, color: #333
classDef devel fill: #bbdefb, color: #333
classDef runtime fill: #c8e6c9, color: #333
classDef cuda fill: #e1bee7, color: #333
class base,base-cuda-runtime,base-cuda-devel base
class core-dependencies,core-devel,universe-dependencies,universe-devel devel
class core,universe runtime
class universe-dependencies-cuda,universe-devel-cuda,universe-cuda cuda
|
Contributor
Author
I rewired the graph so that:
This way:
The vibe coded visualizer script and these html files can be found here. |
Contributor
Author
It's tested under https://github.com/xmfcx/autoware/actions/runs/24570637486 my fork's main. The workflows are building, the cache is functional! All the images are created: https://github.com/xmfcx/autoware/pkgs/container/autoware-new/versions
|
mitsudome-r
approved these changes
Apr 20, 2026
This was referenced Apr 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



base-cuda-runtime/base-cuda-develstages indocker-new/base-cuda.Dockerfileand rebaseuniverse-cudaoff them (was stacked onuniverse). Dropuniverse-runtime-dependencies—universecopies directly fromuniverse-devel. New bake groupci-base-cuda.ansible/playbooks/rmw.yamlandnvidia.yaml; Dockerfiles bind-mount only the role + playbook files they actually invoke (ansible/roles/rmw_implementation,ansible/roles/nvidia_*). Shrinks the bind-mount tree and the cache-invalidation surface.RUN --mount=type=cache,...now carries an explicitid=(apt-cache-${ROS_DISTRO},apt-lists-${ROS_DISTRO},ccache-${ROS_DISTRO},pip-cache,pipx-cache). Required forbuildkit-cache-danceto round-trip mount state between runs.docker-build-pipeline-new.yamltakes aplatforminput;docker-build-and-push-new.yamlfans out tohumble-amd64/humble-arm64/jazzy-amd64/jazzy-arm64. Multi-arch manifest stitching extracted to a dedicateddocker-manifest-new.yamlrun after both arches succeed.docker-build-new.yamlsaves/restores BuildKit mount tarballs viaactions/cache+buildkit-cache-dance, with a size guard and lineage pruning (largest-per-lineage kept on main pushes; stale entries discarded). This is the SAVE side that ci(health-check): rewire cache to match docker-new restructure #7032 reads from.ENV RMW_IMPLEMENTATION=rmw_cyclonedds_cppintobaseso every downstream stage inherits it without relying on runtime-env override.Why
Previously
universe-cudastacked onuniverse-dependencies, so a change to either branch forced the other to rebuild and both branches shared one apt cache key — meaning the huge CUDA / cuDNN / TensorRT payload poisoned non-CUDA tarballs. Splitting CUDA into its own base plus per-arch CI jobs lets each branch cache independently, enables parallel amd64/arm64 builds, and matches the docker-new "Ansible-first" convention (version pins live in roledefaults/main.yaml). Persisting BuildKit mount caches across runs turns cold builds into near-hot ones and is what lets health-check (#7032) reuse apt/ccache/pip/pipx state without spending compute.docker-new/examples/) follows this one.Test plan
docker-build-and-push-newagainst this branch: https://github.com/autowarefoundation/autoware/actions/runs/24580553216 — expect all four per-arch pipelines and both manifest jobs to succeed.default,ci-base,ci-core,ci-base-cuda,ci-universe,ci-universe-cuda. Expect new targetsbase-cuda-runtime,base-cuda-devel. Expectuniverse-runtime-dependenciesabsent.humble-amd64,humble-arm64,jazzy-amd64,jazzy-arm64) each executingbuild-base→build-core→build-universeplusbuild-base-cuda→build-universe-cuda; thenhumble-manifest/jazzy-manifeststitching the multi-arch tags after both arches succeed.ROS_DISTRO=jazzy REGISTRY=autoware PLATFORM=amd64 TAG_DATE=$(date +%Y%m%d) TAG_VERSION= TAG_REF= docker buildx bake -f docker-new/docker-bake.hcl ci-base-cudaautoware:base-cuda-runtime-jazzy-amd64-<date>andautoware:base-cuda-devel-jazzy-amd64-<date>visible indocker images.main, confirm the persisted mount cache: the nextci-universerun'sRestore BuildKit cache mountsstep reportsCache restored from key: buildkit-mounts-ci-universe-<distro>-<platform>-<sha>, and the prune step logskeep: buildkit-mounts-...for exactly one entry per lineage.health-checkrun (ci(health-check): rewire cache to match docker-new restructure #7032's pipeline), confirm its read-only mount restore now hits instead of missing, via theCache restored from key: buildkit-mounts-ci-universe-humble-<platform>-...log line.docker-new/README.mdagainst the new graph indocker-new/docker-bake.hcl.