You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|[docs.yml](docs.yml)| Push to `main` (docs paths) | Publishes `main` docs as the `latest` GitHub Pages version |
@@ -20,9 +20,9 @@ All workflows that use `.github/actions/setup-python-env` now default to the ver
20
20
21
21
## Pull Request Testing (copy-pr-bot)
22
22
23
-
GPU tests on PRs are currently disabled due to internal constraints. We hope to reenable them asap. The rest of this information is kept for posterity, but it is also relevant to the external tests ran for unit and cpu smoke tests.
23
+
GPU tests on PRs are currently disabled due to internal constraints. The `pull-request/*` push trigger is commented out in `gpu-tests.yml`, so copy-pr-bot syncs do not start GPU workflow runs until that trigger is reenabled.
24
24
25
-
GPU tests (`gpu-tests.yml`) run on NVIDIA self-hosted runners, which block `pull_request`-triggered jobs. They use the [copy-pr-bot](https://docs.gha-runners.nvidia.com/platform/apps/copy-pr-bot/) pattern instead:
25
+
When PR GPU tests are reenabled, `gpu-tests.yml` should use the [copy-pr-bot](https://docs.gha-runners.nvidia.com/platform/apps/copy-pr-bot/) pattern because NVIDIA self-hosted runners block `pull_request`-triggered jobs:
26
26
27
27
1. When a PR is opened by a trusted user with trusted changes, `copy-pr-bot` automatically copies the code to a `pull-request/<number>` branch
28
28
2. The push to `pull-request/<number>` triggers the GPU workflow
@@ -35,7 +35,7 @@ CPU checks (`ci-checks.yml`) run on GitHub-hosted `ubuntu-latest` runners and us
35
35
36
36
### On-demand GPU test runs
37
37
38
-
To trigger a GPU test run on an open PR without waiting for the auto-sync, comment `/sync`on the PR. copy-pr-bot will push the current HEAD to `pull-request/<number>`, which fires `gpu-tests.yml` and posts the `GPU CI Status` check result back to the PR -- the same check as the automatic trigger.
38
+
The `/sync`command pushes the current PR HEAD to `pull-request/<number>`. While the `pull-request/*` push trigger is disabled, that push does not fire `gpu-tests.yml` or post a `GPU CI Status` check.
39
39
40
40
Use `/sync` when:
41
41
@@ -49,7 +49,7 @@ Use `/sync` when:
49
49
flowchart LR
50
50
subgraph triggers [Triggers]
51
51
push[Push to main]
52
-
cpb[copy-pr-bot push to pull-request/*]
52
+
schedule[Nightly schedule]
53
53
pr[Pull Request event]
54
54
manual[Manual Dispatch]
55
55
end
@@ -92,8 +92,9 @@ flowchart LR
92
92
publishArtifactory[Publish to Artifactory/PyPI]
93
93
end
94
94
95
-
push --> ci & gpu
96
-
cpb --> gpu
95
+
push --> ci
96
+
schedule --> gpu
97
+
manual --> gpu
97
98
pr --> ci & conventional & secrets
98
99
tag[Tag push v[0-9]*] --> release
99
100
@@ -111,7 +112,7 @@ The `ci-checks.yml` workflow runs on every push to `main` and on pull requests.
111
112
| Job |`make` target | What it checks |
112
113
| --- | --- | --- |
113
114
| Format |`format-check`|`ruff format --check` + `ruff check` + SPDX copyright headers |
114
-
| Format (lock) |`lock-check`|`uv.lock` matches `pyproject.toml`|
115
+
| Format (lock) |`lock-check`|`uv.lock` matches `pyproject.toml`; generated CUDA dependency sections match `cuda_deps.toml`|
115
116
| Typecheck |`typecheck`|`ty check` (excludes per `pyproject.toml [tool.ty.src]`) |
116
117
| Unit Tests |`test-ci`| pytest with coverage (excludes slow, e2e, gpu, smoke) |
117
118
| Smoke Tests |`test-smoke`| CPU smoke tests (training/generation hot paths, tiny models) |
@@ -126,7 +127,7 @@ To replicate CI locally:
126
127
127
128
```bash
128
129
make check # format-check + typecheck
129
-
make lock-check # verify uv.lock
130
+
make lock-check # verify uv.lock and generated CUDA dependency sections
130
131
make test# unit tests
131
132
make test-smoke # CPU smoke tests
132
133
```
@@ -135,13 +136,14 @@ All jobs run on `ubuntu-latest` (GitHub-hosted).
135
136
136
137
## GPU Tests Workflow
137
138
138
-
The `gpu-tests.yml` workflow runs nightly at 02:00 UTC, and can also be triggered manually via `workflow_dispatch`. Manual dispatch includes a `suite` dropdown with `all`, `smoke`, and `e2e` options. There are several key jobs:
139
+
The `gpu-tests.yml` workflow runs nightly at 02:00 UTC, and can also be triggered manually via `workflow_dispatch`. Manual dispatch includes a `suite` dropdown with `all`, `smoke`, and `e2e` options. GPU jobs run on Python 3.11 and matrix over CUDA runtime extras. There are several key jobs:
139
140
140
141
- GPU Smoke Tests: Quick smoke tests on a gpu runner with a 30-minute job timeout and 20-minute step timeout. Required for merge.
141
142
- GPU E2E Tests: End-to-end tests on a gpu runner with a 60-minute job timeout and 45-minute step timeout. Informational -- failures produce a warning but don't block merge.
143
+
- GPU E2E Tests: End-to-end tests on a gpu runner with a 60-minute job timeout and 45-minute step timeout. Informational -- failures produce a warning but don't block merge.
142
144
- GPU CI Status: Aggregation job -- single required check for branch protection. Fails if smoke tests fail; warns if E2E tests fail.
143
145
144
-
The `changes` (Detect Changes) job is skipped on `workflow_dispatch`. GPU jobs use `always()` in their job conditions so manual runs can bypass the skipped dependency and run the selected suite. On scheduled runs, `changes` gates GPU jobs to sourceand test changes.
146
+
The `changes` (Detect Changes) job is skipped on `workflow_dispatch`. GPU jobs use `always()` in their job conditions so manual runs can bypass the skipped dependency and run the selected suite. On scheduled runs, `changes` gates GPU jobs to source, test, dependency (`pyproject.toml` or `uv.lock`), and CI workflow/action changes.
145
147
146
148
GPU jobs use `.github/actions/setup-gpu-test-env` for shared GPU setup: installing `make`, setting up Python from `.python-version`, bootstrapping CUDA dependencies, and checking GPU availability.
147
149
@@ -151,14 +153,19 @@ To trigger manually from the CLI (produces a run but not a PR status check):
151
153
gh workflow run gpu-tests.yml --ref <branch-name> -f suite=all
152
154
gh workflow run gpu-tests.yml --ref <branch-name> -f suite=smoke
153
155
gh workflow run gpu-tests.yml --ref <branch-name> -f suite=e2e
156
+
gh workflow run gpu-tests.yml --ref <branch-name> -f suite=all
157
+
gh workflow run gpu-tests.yml --ref <branch-name> -f suite=smoke
158
+
gh workflow run gpu-tests.yml --ref <branch-name> -f suite=e2e
154
159
```
155
160
156
-
To trigger from the PR UI and get a status check result, use `/sync` -- see [On-demand GPU test runs](#on-demand-gpu-test-runs) above.
161
+
PR status checks from `/sync` require the `pull-request/*` push trigger to be reenabled -- see [On-demand GPU test runs](#on-demand-gpu-test-runs) above.
157
162
158
163
### Runners
159
164
160
165
Internal runners and projects are defined in an internal repo, `nv-gha-runners/enterprise-runner-configuration`.
161
166
167
+
Internal runners and projects are defined in an internal repo, `nv-gha-runners/enterprise-runner-configuration`.
168
+
162
169
| Workflow | Job | Runner Label | Type |
163
170
| --- | --- | --- | --- |
164
171
| CI Checks | All jobs |`ubuntu-latest`| GitHub-hosted |
0 commit comments