-
Notifications
You must be signed in to change notification settings - Fork 611
Enable github-first GPU CI for nightly runs + PRs. #1325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
1899a2a
Add workflows that will eventually target nvidia gpu runners.
coreyjadams b141429
Add temp test to ensure docstring tests don't break on cpu or missing…
coreyjadams 8b29940
Reduce coverage requirement temporarily
coreyjadams 56791a7
Update github-nightly.yml
coreyjadams 8b9a9f7
Only test on ubuntu. Use a trigger for PRs.
coreyjadams b097785
Update doc tests to be more flexible with device selection.
coreyjadams 83d1a76
Ensure cache is cleared and refreshed for testmon data.
coreyjadams 460d8b5
Fix
coreyjadams 1c3c10d
Update conftest for pre-commit compliance.
coreyjadams 476dd1a
Actually generate the test db ...
coreyjadams 3a72d6f
Make sure to force-select all tests for testmon
coreyjadams 5a3086e
Separate testmon and coverage runs
coreyjadams 507d4e9
Isolate into separate CI jobs to refresh /tmp, etc
coreyjadams 5ad43f2
Reduce size of cached objects, and free space before caching
coreyjadams d533ae3
Use a 'latest' tag on uv venvs for now. WHen the lock file updates, …
coreyjadams f1da5ef
Merge branch 'NVIDIA:main' into main
coreyjadams a30e4d3
Use pytest testmon selection for coverage
coreyjadams bc4b019
Use cache, not artifacts, for coverage pull
coreyjadams 0294da3
Remove uv_preview env variable. Simplify coverage snapshot
coreyjadams dbd274a
Prepare for nvidia runners.
coreyjadams 7b30a36
Update lock after adding new dev dep
coreyjadams d55a5a2
Switch to nvidia cpu runners for build.
coreyjadams fd21908
Update github-pr.yml
coreyjadams 71bde79
Merge branch 'NVIDIA:main' into main
coreyjadams File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,206 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2023 - 2024 NVIDIA CORPORATION & AFFILIATES. | ||
| # SPDX-FileCopyrightText: All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| # This CI runs nightly to generate the coverage report and testmon database. | ||
| # It runs ALL tests and caches the testmon database for use by PR workflows. | ||
| # The tests run here will only use UV. This is meant to be nightly functionality | ||
| # testing AND a baseline dependency graph for PRs. | ||
|
|
||
|
|
||
| # TO DO: THE COVERAGE LIMIT IS VERY LOW, BECAUSE THIS IS NOT USING GPU TESTS OR | ||
| # THE DATA-DRIVEN TESTS. RAISE THIS UP AGAIN EVENTUALLY. | ||
|
|
||
|
|
||
| name: Nightly Github Workflow | ||
| on: | ||
| schedule: | ||
| # Run nightly at 2 AM UTC | ||
| - cron: '0 2 * * *' | ||
| workflow_dispatch: | ||
| # Allow manual triggering | ||
|
|
||
| # Container image used across all jobs - update this single value to change everywhere | ||
| # Note: env context not available in container.image, so we hardcode the value | ||
| jobs: | ||
| # Stage 1: Build and cache the environment | ||
| build-environment: | ||
| name: Build Environment | ||
| runs-on: linux-amd64-cpu8 | ||
| container: | ||
| image: nvcr.io/nvidia/pytorch:25.01-py3 | ||
|
|
||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - name: Install uv | ||
| uses: nick-fields/retry@v3 | ||
| with: | ||
| timeout_minutes: 5 | ||
| max_attempts: 3 | ||
| command: | | ||
| curl -LsSf https://astral.sh/uv/install.sh | sh | ||
| echo "$HOME/.cargo/bin" >> $GITHUB_PATH | ||
|
|
||
| - name: Restore uv cache | ||
| id: cache-uv-restore | ||
| uses: actions/cache/restore@v4 | ||
| with: | ||
| path: .venv | ||
| key: uv-env-nightly-latest | ||
|
|
||
| - name: Install package with uv | ||
| if: steps.cache-uv-restore.outputs.cache-hit != 'true' | ||
| run: | | ||
| # Install core dependencies and development group | ||
| uv sync --group dev --preview-features extra-build-dependencies | ||
|
|
||
| - name: Free disk space before caching | ||
| if: steps.cache-uv-restore.outputs.cache-hit != 'true' | ||
| run: | | ||
| rm -rf ~/.cache/uv | ||
| df -h | ||
|
|
||
| - name: Delete old environment cache | ||
| if: steps.cache-uv-restore.outputs.cache-hit != 'true' | ||
| run: | | ||
| gh cache delete "uv-env-nightly-latest" || true | ||
| env: | ||
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
|
|
||
| - name: Save environment to cache | ||
| if: steps.cache-uv-restore.outputs.cache-hit != 'true' | ||
| uses: actions/cache/save@v4 | ||
| with: | ||
| path: .venv | ||
| key: uv-env-nightly-latest | ||
|
|
||
| # Stage 2: Run testmon tests and cache the database | ||
| testmon: | ||
| name: Testmon | ||
| needs: build-environment | ||
| runs-on: linux-amd64-gpu-h100-latest-1 | ||
| container: | ||
| image: nvcr.io/nvidia/pytorch:25.01-py3 | ||
|
|
||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - name: Install uv | ||
| uses: nick-fields/retry@v3 | ||
| with: | ||
| timeout_minutes: 5 | ||
| max_attempts: 3 | ||
| command: | | ||
| curl -LsSf https://astral.sh/uv/install.sh | sh | ||
| echo "$HOME/.cargo/bin" >> $GITHUB_PATH | ||
|
|
||
| - name: Restore environment from cache | ||
| uses: actions/cache/restore@v4 | ||
| with: | ||
| path: .venv | ||
| key: uv-env-nightly-latest | ||
| fail-on-cache-miss: true | ||
|
|
||
| - name: Run core tests (collect all for testmon) | ||
| run: | | ||
| # This populates the testmon database for PR workflows | ||
| uv run python -m pytest --testmon --ignore-glob="*docs*" --ignore-glob="*examples*" | ||
|
|
||
| - name: Delete old testmon cache | ||
| run: | | ||
| gh cache delete "testmon-nightly-latest" || true | ||
| env: | ||
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
|
|
||
| - name: Save testmon database to cache | ||
| uses: actions/cache/save@v4 | ||
| with: | ||
| path: | | ||
| .testmondata | ||
| .testmondata-shm | ||
| .testmondata-wal | ||
| key: testmon-nightly-latest | ||
|
|
||
| # Stage 3: Run coverage tests and upload artifacts | ||
| coverage: | ||
| name: Coverage | ||
| needs: build-environment | ||
| runs-on: nv-gpu-amd64-h100-1gpu | ||
| container: | ||
| image: nvcr.io/nvidia/pytorch:25.01-py3 | ||
|
|
||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - name: Install uv | ||
| uses: nick-fields/retry@v3 | ||
| with: | ||
| timeout_minutes: 5 | ||
| max_attempts: 3 | ||
| command: | | ||
| curl -LsSf https://astral.sh/uv/install.sh | sh | ||
| echo "$HOME/.cargo/bin" >> $GITHUB_PATH | ||
|
|
||
| - name: Restore environment from cache | ||
| uses: actions/cache/restore@v4 | ||
| with: | ||
| path: .venv | ||
| key: uv-env-nightly-latest | ||
| fail-on-cache-miss: true | ||
|
|
||
| - name: Run core tests for coverage report | ||
| run: | | ||
| uv run coverage run --rcfile='test/coverage.pytest.rc' -m pytest --ignore-glob="*docs*" --ignore-glob="*examples*" | ||
|
|
||
| - name: Run doc tests (testmon not supported for doctests) | ||
| run: | | ||
| uv run coverage run --rcfile='test/coverage.docstring.rc' -m pytest --doctest-modules physicsnemo/ --ignore-glob="*internal*" --ignore-glob="*experimental*" | ||
|
|
||
| - name: Delete old coverage cache | ||
| run: | | ||
| gh cache delete "coverage-nightly-latest" || true | ||
| env: | ||
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
|
|
||
| - name: Save coverage files to cache | ||
| uses: actions/cache/save@v4 | ||
| with: | ||
| path: .coverage* | ||
| key: coverage-nightly-latest | ||
|
|
||
| - name: Merge coverage reports | ||
| run: | | ||
| uv run coverage combine | ||
| uv run coverage report --show-missing --omit="*test*" --omit="*internal*" --omit="*experimental*" --fail-under=45 | ||
| uv run coverage html | ||
| # Also create an XML report for potential CI integrations | ||
| uv run coverage xml -o coverage.xml | ||
|
|
||
| - name: Upload coverage HTML report | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: coverage-report-nightly | ||
| path: htmlcov/ | ||
| retention-days: 7 | ||
|
|
||
| - name: Upload combined coverage data | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: coverage-data-nightly | ||
| path: | | ||
| .coverage | ||
| coverage.xml | ||
| retention-days: 30 | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.