Skip to content

chore(ci): try another gpu label#358

Closed
mckornfield wants to merge 1 commit into
mainfrom
gpu-runner-setup-again/mck
Closed

chore(ci): try another gpu label#358
mckornfield wants to merge 1 commit into
mainfrom
gpu-runner-setup-again/mck

Conversation

@mckornfield

Copy link
Copy Markdown
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings April 3, 2026 21:21
@mckornfield mckornfield requested a review from a team as a code owner April 3, 2026 21:21

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates GPU CI runner labels and tidies a minor formatting issue in the troubleshooting guide.

Changes:

  • Switch GPU test jobs to run on the nemo-ci-aws-gpu-x2-container runner label.
  • Remove trailing whitespace in the troubleshooting documentation.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 2 comments.

File Description
docs/user-guide/troubleshooting.md Removes trailing whitespace in a paragraph for cleaner markdown formatting.
.github/workflows/gpu-tests.yml Updates runs-on labels for GPU smoke/E2E jobs to target a different GPU runner pool.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/gpu-tests.yml Outdated
if: ${{ needs.changes.outputs.src == 'true' || needs.changes.outputs.test == 'true' || github.event_name == 'workflow_dispatch' }}
timeout-minutes: 30
runs-on: nemo-ci-aws-gpu-x2
runs-on: nemo-ci-aws-gpu-x2-container

Copilot AI Apr 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runs-on was changed to nemo-ci-aws-gpu-x2-container, but the internal workflow documentation still lists the GPU Tests runner as nemo-ci-aws-gpu-x2 (see .github/workflows/README.md:157-158). Please update that table (or other relevant docs) so the documented runner labels match what the workflow actually uses.

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/gpu-tests.yml Outdated
if: ${{ needs.changes.outputs.src == 'true' || needs.changes.outputs.test == 'true' || github.event_name == 'workflow_dispatch' }}
timeout-minutes: 55
runs-on: nemo-ci-aws-gpu-x2
runs-on: nemo-ci-aws-gpu-x2-container

Copilot AI Apr 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This job’s runs-on label was updated, but .github/workflows/README.md still documents the GPU runner label as nemo-ci-aws-gpu-x2 (e.g. .github/workflows/README.md:157-158). Please update the workflow docs to reflect nemo-ci-aws-gpu-x2-container so future CI changes/debugging aren’t based on stale runner info.

Copilot uses AI. Check for mistakes.
Comment thread docs/user-guide/troubleshooting.md
Signed-off-by: Matt Kornfield <mkornfield@nvidia.com>
@mckornfield

Copy link
Copy Markdown
Collaborator Author

blegh, this doesn't work either

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants