Skip to content

ci(docker): remove artifacts from base CUDA image#5830

Merged
youtalk merged 1 commit intoautowarefoundation:mainfrom
amadeuszsz:ci/remove-artifacts
Mar 6, 2025
Merged

ci(docker): remove artifacts from base CUDA image#5830
youtalk merged 1 commit intoautowarefoundation:mainfrom
amadeuszsz:ci/remove-artifacts

Conversation

@amadeuszsz
Copy link
Copy Markdown
Contributor

Description

Relax about 1.7 GB space.

How was this PR tested?

Notes for reviewers

Due to type of CI runner, there is no unit tests which can use ML artifacts.

Effects on system behavior

None.

Signed-off-by: Amadeusz Szymko <amadeusz.szymko.2@tier4.jp>
@github-actions
Copy link
Copy Markdown

Thank you for contributing to the Autoware project!

🚧 If your pull request is in progress, switch it to draft mode.

Please ensure:

@amadeuszsz amadeuszsz self-assigned this Feb 28, 2025
@amadeuszsz amadeuszsz added the type:containers Docker containers, containerization of components, or container orchestration. label Feb 28, 2025
@amadeuszsz
Copy link
Copy Markdown
Contributor Author

@YoshiRi @kminoda
Please confirm if we can remove artifacts from our Docker images.

@xmfcx
Copy link
Copy Markdown
Contributor

xmfcx commented Mar 5, 2025

Wow I didn't even know they were a part of the images :o

@amadeuszsz amadeuszsz marked this pull request as ready for review March 5, 2025 08:36
@amadeuszsz amadeuszsz added the run:health-check Run health-check label Mar 5, 2025
Copy link
Copy Markdown
Member

@youtalk youtalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only does the learning models disappear from the development containers, but also from the runtime containers. Are we sure that’s okay?

@xmfcx https://github.com/orgs/autowarefoundation/discussions/5007#discussioncomment-10086717

@xmfcx
Copy link
Copy Markdown
Contributor

xmfcx commented Mar 5, 2025

@youtalk -san, I remember that discussion, but I thought they were already removed since then 😕

Not only does the learning models disappear from the development containers, but also from the runtime containers. Are we sure that’s okay?

I'm OK with adding them back to the runtime containers, either in this PR or a new PR.

I'm not sure if this would fix it but currently the docker-build workflow is not working so we need a solution soon:

@amadeuszsz
Copy link
Copy Markdown
Contributor Author

We discussed it in [TIER IV INTERNAL LINK] and seems there is no need to deploy OSS images with artifacts. I would like to keep artifacts for unit tests purposes, but AFAIK, we don't plan to use CI runners with CUDA runtime support.

Copy link
Copy Markdown
Member

@youtalk youtalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then LGTM

@youtalk youtalk enabled auto-merge (squash) March 6, 2025 00:31
@youtalk
Copy link
Copy Markdown
Member

youtalk commented Mar 6, 2025

We need to update autoware-base first, so I’m working on it now.
https://github.com/autowarefoundation/autoware/actions/runs/13689030397
After that, once the health-check passes, we’ll merge this PR.

@youtalk
Copy link
Copy Markdown
Member

youtalk commented Mar 6, 2025

@youtalk youtalk merged commit a95f45f into autowarefoundation:main Mar 6, 2025
@mitsudome-r
Copy link
Copy Markdown
Member

@amadeuszsz Did you check where the artifacts were installed in the old image? I couldn't find any.

@amadeuszsz
Copy link
Copy Markdown
Contributor Author

@amadeuszsz Did you check where the artifacts were installed in the old image? I couldn't find any.

@mitsudome-r
Please check /root/autoware_data in universe-cuda image.

@xmfcx
Copy link
Copy Markdown
Contributor

xmfcx commented Mar 6, 2025

To check what occupies most size easily, you can run the following in the docker container:

sudo apt update && sudo apt install ncdu -y
cd /
ncdu

And you can navigate with arrow keys and enter and backspace keys. It will sort all the folders from large to small. This should make it easier to find big blobs.

@xmfcx
Copy link
Copy Markdown
Contributor

xmfcx commented Mar 7, 2025

So there has been a lot of misunderstandings from my part and some miscommunications.

Until yesterday, I thought:
We had artifacts in both devel and runtime images.
This PR removes them from the devel image.

But in reality:
We had artifacts in only runtime images.
This PR removes them from the runtime image.

So this PR doesn't change anything from the autoware core/universe CI perspective. It only effects the docker image building workflow size requirements.

So this could be reverted. But I don't mind either way.

@amadeuszsz
Copy link
Copy Markdown
Contributor Author

amadeuszsz commented Mar 7, 2025

@xmfcx
For me it could be reverted if we have unit tests possible. Other than that, I don't see any reason to store artifacts in docker images, but correct me if I'm wrong. We can only expect the size of artifacts will grow.

@xmfcx
Copy link
Copy Markdown
Contributor

xmfcx commented Mar 7, 2025

Thing is, the image that you've modified universe-base-cuda, which influences the universe-cuda was not meant for CI in the first place, it was a runtime image, just for the users to pull and run the image.

Universe CI uses universe-devel-cuda which doesn't have the artifacts anyways.

@amadeuszsz
Copy link
Copy Markdown
Contributor Author

I see. I also considered that each artifacts update forces autoware-base:cuda-latest update, which consumes CI time and gives us nothing. If:

  • we have still relatively huge space margin for artifacts,
  • frequent artifacts update are welcome,

we can revert this PR.

@xmfcx
Copy link
Copy Markdown
Contributor

xmfcx commented Mar 7, 2025

we have still relatively huge space margin for artifacts

We have about 6-7GB to spare for cache.

I've marked some summary here on the dockerfile.svg:

images

There are multiple solutions we could consider:

  • Separate out the building of the final image with artifacts to another job/machine?
    • autoware-base:cuda-latest, universe-sensing-perception-cuda, universe-cuda are the only images that contain the artifacts.
    • Maybe these could be built on a separate action.
    • This would add a lot of work and complexity just to overcome a couple gigabytes of storage shortage.
  • Just revert the PR and add a mechanism to keep the cache size low (still, we don't have much margin, just 6-7GB).
  • Rent proper machines for this job and not worry about storage concerns. (@mitsudome-r should we consider OVH again? The machine we got is working well so far.)
  • Keep this PR for now until someone complains? I don't know who are actually using Autoware's runtime PRs.

This could be also discussed in the OpenADKit Working Group meeting next week.
Related: https://discord.com/channels/953808765935816715/953877118415151205/1347463585474805801 (from autoware discord)

@amadeuszsz
Copy link
Copy Markdown
Contributor Author

Keep this PR for now until someone complains? I don't know who are actually using Autoware's runtime PRs.

We already mount autoware_map. From now on, users will do same way with autoware_data.

@xmfcx
Copy link
Copy Markdown
Contributor

xmfcx commented Mar 7, 2025

Difference is that autoware_map folder is a simple manual download process.
But for autoware_data, you need to clone autoware, install ansible, run the download artifacts playboook to get it.

And you can only download the latest artifacts. If an artifact is removed in a new version, old images cannot be used anymore.

@amadeuszsz
Copy link
Copy Markdown
Contributor Author

Good point, that kind of link between software and artifacts disappeared now from our image.
By the way, could you clarify how we deploy image (universe-cuda before this PR) to GitHub registry with respect to repositories versioning? Do we use:

  1. latest autoware and nigthly repos,
  2. latest autoware and base repos?

If this is the option 1, it will be beneficial to keep artifacts in Docker images.
If this is the option 2, there might be discrepancy in case of updating artifacts, which requires updated software as well. That being said, we can't assure compatibility between stored artifacts and built libraries.

@xmfcx
Copy link
Copy Markdown
Contributor

xmfcx commented Mar 13, 2025

The current Dockerfile pulls autoware.repos, not nightly.

there might be discrepancy in case of updating artifacts, which requires updated software as well.

I've noticed this too now.
For now, we need to manually make sure to release a new version when the artifacts are updated.

Truth is, I'm not sure about the actual users of these containers. And it makes it harder for me to make a decision.

@xmfcx
Copy link
Copy Markdown
Contributor

xmfcx commented Mar 13, 2025

@amadeuszsz in today's OpenADKit meeting, we've concluded on rolling back this change.

Reasons:

Could you revert this change?

amadeuszsz added a commit to amadeuszsz/autoware that referenced this pull request Mar 14, 2025
amadeuszsz added a commit to amadeuszsz/autoware that referenced this pull request Mar 14, 2025
…undation#5830)"

This reverts commit a95f45f.

Signed-off-by: Amadeusz Szymko <amadeusz.szymko.2@tier4.jp>
@xmfcx
Copy link
Copy Markdown
Contributor

xmfcx commented Mar 14, 2025

Following were created to bring artifacts back in the runtime image:

xmfcx pushed a commit that referenced this pull request Mar 19, 2025
Revert "ci(docker): remove artifacts from base CUDA image (#5830)"

This reverts commit a95f45f.

Signed-off-by: Amadeusz Szymko <amadeusz.szymko.2@tier4.jp>
sykwer pushed a commit to sykwer/autoware that referenced this pull request Mar 22, 2025
…ndation#5876)

Revert "ci(docker): remove artifacts from base CUDA image (autowarefoundation#5830)"

This reverts commit a95f45f.

Signed-off-by: Amadeusz Szymko <amadeusz.szymko.2@tier4.jp>
Signed-off-by: sykwer <sykwer@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run:health-check Run health-check type:containers Docker containers, containerization of components, or container orchestration.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants