-
Notifications
You must be signed in to change notification settings - Fork 611
Update nightly uv pipeline #1493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
coreyjadams
wants to merge
34
commits into
main
Choose a base branch
from
update-nightly-uv-pipeline
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 27 commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
b4becee
Updating the github nightly build with uv to get optional deps better.
coreyjadams aba655f
Ensure transformer engine is skipped in CI build until cuda13 fix com…
coreyjadams f5135a8
Include hidden files in ci venv uploads
coreyjadams 606c00b
Use a custom action for the install steps of the package.
coreyjadams 669c87a
Try adding an env variable to use headless pyvista off screen
coreyjadams b9066e0
Skip pv pplotter errors
coreyjadams 85d4e9a
Update CI
coreyjadams 026e35c
Attempt to fix torch scatter build
coreyjadams 2569422
Try again with pyg
coreyjadams 0ad2569
Add cmake. try again
coreyjadams f0a5a64
Testing another way
coreyjadams 8f0c973
Add debuging options
coreyjadams 245fcd8
Add a dockerfile build action. Switch to cuda 12
coreyjadams bd9dfb0
make the cache pull robust
coreyjadams e76e3af
Trying again
coreyjadams 2ee7db7
Trying again again
coreyjadams f2b63bf
trying again again again
coreyjadams a259a52
Trying again again again again
coreyjadams d3f341e
Trying again again again again again
coreyjadams 0fa114b
try more agains
coreyjadams 9b787ea
who knows
coreyjadams 135542f
Increase test tolerance. Upload test report as artifact
coreyjadams 6c43f5d
Turn off 3D convnd test, it's not numerically stable
coreyjadams 92444fb
upload better report.
coreyjadams ef6e077
fix workspace permissions
coreyjadams 81a9d2b
revert workspace changes, upload reports for coverage path and genera…
coreyjadams ad31587
Merge branch 'main' into update-nightly-uv-pipeline
coreyjadams 007705d
Restore container pipeline against main.
coreyjadams a001a5e
rmove container build action from this pr
coreyjadams ad0082d
reintroduce pytorch-g deps on torch.
coreyjadams 9ec0086
Merge branch 'main' into update-nightly-uv-pipeline
coreyjadams 8053f8d
updates in this pr:
coreyjadams 66c9e2a
Merge branch 'main' into update-nightly-uv-pipeline
coreyjadams 9de9f1f
add explicit git repo
coreyjadams File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| name: Bootstrap cuDNN CI container | ||
| description: Install OS dependencies and uv in CUDA cuDNN container jobs | ||
| inputs: | ||
| python-version: | ||
| description: Python major.minor expected in the container | ||
| required: false | ||
| default: "3.12" | ||
| runs: | ||
| using: composite | ||
| steps: | ||
| - name: Install system dependencies | ||
| shell: bash | ||
| run: | | ||
| set -euo pipefail | ||
| export DEBIAN_FRONTEND=noninteractive | ||
| apt-get update | ||
| apt-get install -y --no-install-recommends \ | ||
| ca-certificates \ | ||
| curl \ | ||
| git \ | ||
| gh \ | ||
| build-essential \ | ||
| cmake \ | ||
| pkg-config \ | ||
| python3 \ | ||
| python3-dev \ | ||
| python3-venv \ | ||
| python3-pip | ||
| ln -sf /usr/bin/python3 /usr/bin/python | ||
| rm -rf /var/lib/apt/lists/* | ||
|
|
||
| - name: Install uv | ||
| shell: bash | ||
| run: | | ||
| set -euo pipefail | ||
| curl -LsSf https://astral.sh/uv/install.sh | sh | ||
| echo "$HOME/.cargo/bin" >> "$GITHUB_PATH" | ||
|
|
||
| - name: Print toolchain versions | ||
| shell: bash | ||
| run: | | ||
| set -euo pipefail | ||
| python3 --version | ||
| uv --version | ||
| gcc --version | head -n 1 | ||
| cmake --version | head -n 1 | ||
coreyjadams marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| if command -v nvcc >/dev/null 2>&1; then | ||
| nvcc --version | ||
| else | ||
| echo "nvcc not found on PATH" | ||
| fi | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| name: Build CI container image | ||
| description: Build the repository CI container image from Dockerfile | ||
| inputs: | ||
| image-tag: | ||
| description: Fully qualified local image tag to build | ||
| required: false | ||
| default: physicsnemo-ci:nightly | ||
| dockerfile: | ||
| description: Dockerfile path | ||
| required: false | ||
| default: Dockerfile | ||
| context: | ||
| description: Docker build context path | ||
| required: false | ||
| default: . | ||
| target: | ||
| description: Docker build target stage | ||
| required: false | ||
| default: ci | ||
| platform: | ||
| description: Docker target platform | ||
| required: false | ||
| default: linux/amd64 | ||
| outputs: | ||
| image-tag: | ||
| description: The built image tag | ||
| value: ${{ inputs.image-tag }} | ||
| runs: | ||
| using: composite | ||
| steps: | ||
| - name: Verify Docker CLI availability | ||
| shell: bash | ||
| run: | | ||
| set -euo pipefail | ||
| docker version | ||
|
|
||
| - name: Build CI container image | ||
| shell: bash | ||
| run: | | ||
| set -euo pipefail | ||
| echo "::group::docker build" | ||
| echo "dockerfile=${{ inputs.dockerfile }}" | ||
| echo "context=${{ inputs.context }}" | ||
| echo "target=${{ inputs.target }}" | ||
| echo "platform=${{ inputs.platform }}" | ||
| echo "tag=${{ inputs.image-tag }}" | ||
| docker build \ | ||
| --platform "${{ inputs.platform }}" \ | ||
| --file "${{ inputs.dockerfile }}" \ | ||
| --target "${{ inputs.target }}" \ | ||
| --tag "${{ inputs.image-tag }}" \ | ||
| "${{ inputs.context }}" | ||
| echo "::endgroup::" | ||
|
|
||
| - name: Show built image details | ||
| shell: bash | ||
| run: | | ||
| set -euo pipefail | ||
| docker image inspect "${{ inputs.image-tag }}" --format='id={{.Id}} size={{.Size}}' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| name: Setup uv environment | ||
| description: Restore uv and venv caches, and run uv sync on cache miss | ||
| inputs: | ||
| uv-cache-key-prefix: | ||
| description: Prefix for uv package cache key | ||
| required: true | ||
| venv-cache-key-prefix: | ||
| description: Prefix for virtual environment cache key | ||
| required: true | ||
| cache-key-suffix: | ||
| description: Deterministic suffix appended to cache key prefixes | ||
| required: true | ||
| outputs: | ||
| uv_cache_hit: | ||
| description: Whether uv package cache had an exact key hit | ||
| value: ${{ steps.restore-uv-cache.outputs.cache-hit }} | ||
| venv_cache_hit: | ||
| description: Whether venv cache had an exact key hit | ||
| value: ${{ steps.restore-venv-cache.outputs.cache-hit }} | ||
| runs: | ||
| using: composite | ||
| steps: | ||
| - name: Restore uv package cache | ||
| id: restore-uv-cache | ||
| uses: actions/cache/restore@v4 | ||
| with: | ||
| path: ~/.cache/uv | ||
| key: ${{ inputs.uv-cache-key-prefix }}-${{ inputs.cache-key-suffix }} | ||
| fail-on-cache-miss: false | ||
| restore-keys: | | ||
| ${{ inputs.uv-cache-key-prefix }}- | ||
|
|
||
| - name: Restore venv cache | ||
| id: restore-venv-cache | ||
| uses: actions/cache/restore@v4 | ||
| with: | ||
| path: .venv | ||
| key: ${{ inputs.venv-cache-key-prefix }}-${{ inputs.cache-key-suffix }} | ||
| fail-on-cache-miss: false | ||
| restore-keys: | | ||
| ${{ inputs.venv-cache-key-prefix }}- | ||
|
|
||
| - name: Debug cache and environment context | ||
| shell: bash | ||
| run: | | ||
| set -euo pipefail | ||
| echo "::group::setup-uv-env debug context" | ||
| echo "uv cache key: ${{ inputs.uv-cache-key-prefix }}-${{ inputs.cache-key-suffix }}" | ||
| echo "venv cache key: ${{ inputs.venv-cache-key-prefix }}-${{ inputs.cache-key-suffix }}" | ||
| echo "uv cache exact hit: ${{ steps.restore-uv-cache.outputs.cache-hit }}" | ||
| echo "venv cache exact hit: ${{ steps.restore-venv-cache.outputs.cache-hit }}" | ||
| echo "workspace: $GITHUB_WORKSPACE" | ||
| df -h | ||
| echo "::endgroup::" | ||
|
|
||
| - name: Install dependencies with uv (dev + cu12) | ||
| if: steps.restore-venv-cache.outputs.cache-hit != 'true' | ||
| shell: bash | ||
| run: | | ||
| set -euo pipefail | ||
| export UV_LINK_MODE=copy | ||
| echo "::group::uv sync (dev + cu12)" | ||
| uv sync \ | ||
| --frozen \ | ||
| --group dev \ | ||
| --extra cu12 | ||
peterdsharpe marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| echo "::endgroup::" | ||
| uv run python -c "import torch; print(f'torch={torch.__version__} cuda={torch.version.cuda}')" | ||
|
|
||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question, does this test trigger any packages to get built from source or is this like core deps and we have binaries for everything?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context, just curious about if this is all the deps needed to install everything (and build deps from source when needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does trigger some deps to build from source, sometimes. torch_geometric and family, natten both come to mind. transformer_engine will be in the pile eventually too.
It does not do it all the time though: the uv caching will, since uv itself will cache the binaries locally, have the pre-built wheel from last night available tonight, if that makes sense. And the next night, and the next night, and so on until the the cache is invalid or the lock file requires a new build. So the build doesn't trigger everything all the time.
The first build took forever though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And no we're not catching everything yet. I need to get TE for sure, still missing a few others I think. I've got a lot of the deps, though. I was hoping to roll out incrementally from here - a part of the reporting stage was to help ID which tests are skipped due to missing software deps and fix that.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah its the builds form packages from source is what is raising my previous questions about cache invalidation / refreshes.
This is where leaning hard on caching and aggressively using it (and not refreshing it all the time) is very very useful. Initial cache builds for e2s can take hours with e2grid, natten, flash-attn, torch-harmoncis, etc can take hours... something that is useful practically to only do maybe at most once per week imo
managing the cache is uvs problem right. it does not need a fresh cache all the time. creating a new cache really just checks pypi is still alive (I hope so) and source builds still operate as intended. Could be nightly... but when things go wrong... you dont want all new PRs to get stuck building new caches as well because the lock file changed and now your having an issue with TE or other source build