Skip to content
Open
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
b4becee
Updating the github nightly build with uv to get optional deps better.
coreyjadams Feb 19, 2026
aba655f
Ensure transformer engine is skipped in CI build until cuda13 fix com…
coreyjadams Feb 19, 2026
f5135a8
Include hidden files in ci venv uploads
coreyjadams Feb 19, 2026
606c00b
Use a custom action for the install steps of the package.
coreyjadams Feb 19, 2026
669c87a
Try adding an env variable to use headless pyvista off screen
coreyjadams Feb 19, 2026
b9066e0
Skip pv pplotter errors
coreyjadams Feb 19, 2026
85d4e9a
Update CI
coreyjadams Feb 20, 2026
026e35c
Attempt to fix torch scatter build
coreyjadams Feb 20, 2026
2569422
Try again with pyg
coreyjadams Feb 20, 2026
0ad2569
Add cmake. try again
coreyjadams Feb 20, 2026
f0a5a64
Testing another way
coreyjadams Feb 20, 2026
8f0c973
Add debuging options
coreyjadams Feb 20, 2026
245fcd8
Add a dockerfile build action. Switch to cuda 12
coreyjadams Feb 20, 2026
bd9dfb0
make the cache pull robust
coreyjadams Feb 20, 2026
e76e3af
Trying again
coreyjadams Feb 20, 2026
2ee7db7
Trying again again
coreyjadams Feb 20, 2026
f2b63bf
trying again again again
coreyjadams Feb 20, 2026
a259a52
Trying again again again again
coreyjadams Feb 20, 2026
d3f341e
Trying again again again again again
coreyjadams Feb 21, 2026
0fa114b
try more agains
coreyjadams Feb 21, 2026
9b787ea
who knows
coreyjadams Feb 21, 2026
135542f
Increase test tolerance. Upload test report as artifact
coreyjadams Feb 23, 2026
6c43f5d
Turn off 3D convnd test, it's not numerically stable
coreyjadams Feb 23, 2026
92444fb
upload better report.
coreyjadams Feb 24, 2026
ef6e077
fix workspace permissions
coreyjadams Feb 24, 2026
81a9d2b
revert workspace changes, upload reports for coverage path and genera…
coreyjadams Feb 24, 2026
ad31587
Merge branch 'main' into update-nightly-uv-pipeline
coreyjadams Mar 12, 2026
007705d
Restore container pipeline against main.
coreyjadams Mar 12, 2026
a001a5e
rmove container build action from this pr
coreyjadams Mar 12, 2026
ad0082d
reintroduce pytorch-g deps on torch.
coreyjadams Mar 12, 2026
9ec0086
Merge branch 'main' into update-nightly-uv-pipeline
coreyjadams Mar 12, 2026
8053f8d
updates in this pr:
coreyjadams Mar 17, 2026
66c9e2a
Merge branch 'main' into update-nightly-uv-pipeline
coreyjadams Mar 17, 2026
9de9f1f
add explicit git repo
coreyjadams Mar 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions .github/actions/bootstrap-cudnn-ci/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
name: Bootstrap cuDNN CI container
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question, does this test trigger any packages to get built from source or is this like core deps and we have binaries for everything?

Copy link
Collaborator

@NickGeneva NickGeneva Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context, just curious about if this is all the deps needed to install everything (and build deps from source when needed)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does trigger some deps to build from source, sometimes. torch_geometric and family, natten both come to mind. transformer_engine will be in the pile eventually too.

It does not do it all the time though: the uv caching will, since uv itself will cache the binaries locally, have the pre-built wheel from last night available tonight, if that makes sense. And the next night, and the next night, and so on until the the cache is invalid or the lock file requires a new build. So the build doesn't trigger everything all the time.

The first build took forever though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And no we're not catching everything yet. I need to get TE for sure, still missing a few others I think. I've got a lot of the deps, though. I was hoping to roll out incrementally from here - a part of the reporting stage was to help ID which tests are skipped due to missing software deps and fix that.

Copy link
Collaborator

@NickGeneva NickGeneva Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah its the builds form packages from source is what is raising my previous questions about cache invalidation / refreshes.

This is where leaning hard on caching and aggressively using it (and not refreshing it all the time) is very very useful. Initial cache builds for e2s can take hours with e2grid, natten, flash-attn, torch-harmoncis, etc can take hours... something that is useful practically to only do maybe at most once per week imo

managing the cache is uvs problem right. it does not need a fresh cache all the time. creating a new cache really just checks pypi is still alive (I hope so) and source builds still operate as intended. Could be nightly... but when things go wrong... you dont want all new PRs to get stuck building new caches as well because the lock file changed and now your having an issue with TE or other source build

description: Install OS dependencies and uv in CUDA cuDNN container jobs
inputs:
python-version:
description: Python major.minor expected in the container
required: false
default: "3.12"
runs:
using: composite
steps:
- name: Install system dependencies
shell: bash
run: |
set -euo pipefail
export DEBIAN_FRONTEND=noninteractive
apt-get update
apt-get install -y --no-install-recommends \
ca-certificates \
curl \
git \
gh \
build-essential \
cmake \
pkg-config \
python3 \
python3-dev \
python3-venv \
python3-pip
ln -sf /usr/bin/python3 /usr/bin/python
rm -rf /var/lib/apt/lists/*

- name: Install uv
shell: bash
run: |
set -euo pipefail
curl -LsSf https://astral.sh/uv/install.sh | sh
echo "$HOME/.cargo/bin" >> "$GITHUB_PATH"

- name: Print toolchain versions
shell: bash
run: |
set -euo pipefail
python3 --version
uv --version
gcc --version | head -n 1
cmake --version | head -n 1
if command -v nvcc >/dev/null 2>&1; then
nvcc --version
else
echo "nvcc not found on PATH"
fi
59 changes: 59 additions & 0 deletions .github/actions/build-ci-container/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
name: Build CI container image
description: Build the repository CI container image from Dockerfile
inputs:
image-tag:
description: Fully qualified local image tag to build
required: false
default: physicsnemo-ci:nightly
dockerfile:
description: Dockerfile path
required: false
default: Dockerfile
context:
description: Docker build context path
required: false
default: .
target:
description: Docker build target stage
required: false
default: ci
platform:
description: Docker target platform
required: false
default: linux/amd64
outputs:
image-tag:
description: The built image tag
value: ${{ inputs.image-tag }}
runs:
using: composite
steps:
- name: Verify Docker CLI availability
shell: bash
run: |
set -euo pipefail
docker version

- name: Build CI container image
shell: bash
run: |
set -euo pipefail
echo "::group::docker build"
echo "dockerfile=${{ inputs.dockerfile }}"
echo "context=${{ inputs.context }}"
echo "target=${{ inputs.target }}"
echo "platform=${{ inputs.platform }}"
echo "tag=${{ inputs.image-tag }}"
docker build \
--platform "${{ inputs.platform }}" \
--file "${{ inputs.dockerfile }}" \
--target "${{ inputs.target }}" \
--tag "${{ inputs.image-tag }}" \
"${{ inputs.context }}"
echo "::endgroup::"

- name: Show built image details
shell: bash
run: |
set -euo pipefail
docker image inspect "${{ inputs.image-tag }}" --format='id={{.Id}} size={{.Size}}'
69 changes: 69 additions & 0 deletions .github/actions/setup-uv-env/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
name: Setup uv environment
description: Restore uv and venv caches, and run uv sync on cache miss
inputs:
uv-cache-key-prefix:
description: Prefix for uv package cache key
required: true
venv-cache-key-prefix:
description: Prefix for virtual environment cache key
required: true
cache-key-suffix:
description: Deterministic suffix appended to cache key prefixes
required: true
outputs:
uv_cache_hit:
description: Whether uv package cache had an exact key hit
value: ${{ steps.restore-uv-cache.outputs.cache-hit }}
venv_cache_hit:
description: Whether venv cache had an exact key hit
value: ${{ steps.restore-venv-cache.outputs.cache-hit }}
runs:
using: composite
steps:
- name: Restore uv package cache
id: restore-uv-cache
uses: actions/cache/restore@v4
with:
path: ~/.cache/uv
key: ${{ inputs.uv-cache-key-prefix }}-${{ inputs.cache-key-suffix }}
fail-on-cache-miss: false
restore-keys: |
${{ inputs.uv-cache-key-prefix }}-

- name: Restore venv cache
id: restore-venv-cache
uses: actions/cache/restore@v4
with:
path: .venv
key: ${{ inputs.venv-cache-key-prefix }}-${{ inputs.cache-key-suffix }}
fail-on-cache-miss: false
restore-keys: |
${{ inputs.venv-cache-key-prefix }}-

- name: Debug cache and environment context
shell: bash
run: |
set -euo pipefail
echo "::group::setup-uv-env debug context"
echo "uv cache key: ${{ inputs.uv-cache-key-prefix }}-${{ inputs.cache-key-suffix }}"
echo "venv cache key: ${{ inputs.venv-cache-key-prefix }}-${{ inputs.cache-key-suffix }}"
echo "uv cache exact hit: ${{ steps.restore-uv-cache.outputs.cache-hit }}"
echo "venv cache exact hit: ${{ steps.restore-venv-cache.outputs.cache-hit }}"
echo "workspace: $GITHUB_WORKSPACE"
df -h
echo "::endgroup::"

- name: Install dependencies with uv (dev + cu12)
if: steps.restore-venv-cache.outputs.cache-hit != 'true'
shell: bash
run: |
set -euo pipefail
export UV_LINK_MODE=copy
echo "::group::uv sync (dev + cu12)"
uv sync \
--frozen \
--group dev \
--extra cu12
echo "::endgroup::"
uv run python -c "import torch; print(f'torch={torch.__version__} cuda={torch.version.cuda}')"

Loading
Loading