fix(ci): cache heavy CUDA wheels in install-test (#1796) by KevinSailema · Pull Request #1798 · NVIDIA-NeMo/Automodel

KevinSailema · 2026-04-13T04:57:53Z

Reduce Python+CUDA installation time in CI by prebuilding and caching heavyweight CUDA dependency wheels, then reusing those wheels in install jobs.

Changelog

Add a dedicated CUDA wheelhouse job to build wheels for heavy dependencies used by CUDA extras.
Add GitHub Actions cache for the wheelhouse, keyed by OS, Python version, dependency lock inputs, and workflow file hash.
Add fail-fast validation so the workflow errors early if expected heavyweight wheels are not produced.
Upload built wheels as an artifact for downstream jobs in the same workflow run.
Update the CUDA pip install job to depend on wheelhouse generation and download wheel artifacts before install.
Update pip install behavior to prefer local prebuilt wheels first, while preserving fallback behavior when cache/artifact is cold.
Update the install summary job dependencies to include the new wheelhouse job so gating stays consistent.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?

Notes:

This is a CI workflow optimization change; no application/runtime logic was modified.
Validation is based on workflow execution success and install-time reduction in CI runs.

Additional Information

Related to Optimize causal-conv1d/transformer engine installation in CI #1796
Cold-cache behavior: first run may still compile once.
Warm-cache behavior: subsequent runs should reuse cached wheels and reduce install time significantly.

Signed-off-by: Kevin Sailema <108644636+KevinSailema@users.noreply.github.com>

copy-pr-bot · 2026-04-13T04:57:58Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

akoumpa · 2026-04-13T17:15:14Z

Thanks a lot for making these @KevinSailema ! I'm not an area expert, so I wanted to ask you, can this also be used when the docker container is made that run on the gha ci workers? because initially, it tests installation on a few targets [20 min], then builds the docker container [20 min], so I was wondering if your fix could be reused?

Hi, @thomasdhc can you review as the domain expert? Thank you.

thomasdhc · 2026-04-13T20:47:16Z

/ok to test cc8cb2d

thomasdhc · 2026-04-13T21:41:31Z

@KevinSailema Please review failures

Signed-off-by: Kevin Sailema <108644636+KevinSailema@users.noreply.github.com>

akoumpa · 2026-04-19T02:05:16Z

/ok to test 5368547

fix(ci): cache heavy CUDA wheels in install-test (NVIDIA-NeMo#1796)

cc8cb2d

Signed-off-by: Kevin Sailema <108644636+KevinSailema@users.noreply.github.com>

KevinSailema requested a review from a team as a code owner April 13, 2026 04:57

github-actions bot added the community-request label Apr 13, 2026

copy-pr-bot bot had a problem deploying to nemo-ci April 13, 2026 20:47 Failure

copy-pr-bot bot temporarily deployed to nemo-ci April 13, 2026 20:47 Inactive

copy-pr-bot bot temporarily deployed to test April 13, 2026 20:47 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 13, 2026 21:02 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 13, 2026 21:15 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 13, 2026 21:37 Inactive

akoumpa linked an issue Apr 14, 2026 that may be closed by this pull request

Optimize causal-conv1d/transformer engine installation in CI #1796

Open

chtruong814 added waiting-for-customer Waiting for response from the original author and removed waiting-for-customer Waiting for response from the original author labels Apr 14, 2026

fix(ci): make CUDA wheelhouse build runner-safe

5368547

Signed-off-by: Kevin Sailema <108644636+KevinSailema@users.noreply.github.com>

chtruong814 added the needs-follow-up Issue needs follow-up label Apr 18, 2026

chtruong814 added waiting-on-customer Waiting on the original author to respond and removed needs-follow-up Issue needs follow-up labels Apr 18, 2026

copy-pr-bot bot temporarily deployed to nemo-ci April 19, 2026 02:05 Inactive

copy-pr-bot bot temporarily deployed to test April 19, 2026 02:05 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 19, 2026 02:28 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 19, 2026 03:58 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 19, 2026 04:03 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 19, 2026 04:20 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): cache heavy CUDA wheels in install-test (#1796)#1798

fix(ci): cache heavy CUDA wheels in install-test (#1796)#1798
KevinSailema wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
KevinSailema:fix/issue-1796-optimize-cuda-install-ci

KevinSailema commented Apr 13, 2026

Uh oh!

copy-pr-bot bot commented Apr 13, 2026

Uh oh!

akoumpa commented Apr 13, 2026

Uh oh!

thomasdhc commented Apr 13, 2026

Uh oh!

thomasdhc commented Apr 13, 2026

Uh oh!

akoumpa commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

KevinSailema commented Apr 13, 2026

Changelog

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Apr 13, 2026

Uh oh!

akoumpa commented Apr 13, 2026

Uh oh!

thomasdhc commented Apr 13, 2026

Uh oh!

thomasdhc commented Apr 13, 2026

Uh oh!

akoumpa commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants