Skip to content

ci(feat): use AWS ephemeral runners for external contributors#1892

Open
ko3n1g wants to merge 5 commits intomainfrom
ko3n1g/ci/aws-ephemeral-runners
Open

ci(feat): use AWS ephemeral runners for external contributors#1892
ko3n1g wants to merge 5 commits intomainfrom
ko3n1g/ci/aws-ephemeral-runners

Conversation

@ko3n1g
Copy link
Copy Markdown
Contributor

@ko3n1g ko3n1g commented Apr 17, 2026

Summary

Routes external contributors to isolated, ephemeral runners while NVIDIA maintainers keep the persistent ones, leveraging the SSO membership check already built into the pre-flight workflow.

Changes:

  • Passes nemo-ci-aws-gpu-x2 / nemo-ci-aws-gpu-x2-ephemeral as default_runner_prefix / non_nvidia_runner_prefix inputs to the pre-flight workflow
  • Uses needs.pre-flight.outputs.runner_prefix in all GPU jobs (cicd-container-build, unit tests, e2e tests) — the pre-flight already handles SSO membership and emits the correct runner based on contributor status
  • Removes the redundant is-not-external-contributor job and the copied check-nvidia-sso-membership composite action

Example

```yaml
pre-flight:
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_cicd_preflight.yml@v0.80.1
with:
default_runner_prefix: nemo-ci-aws-gpu-x2
non_nvidia_runner_prefix: nemo-ci-aws-gpu-x2-ephemeral
```

Test plan

  • Trigger CI from an NVIDIA-member PR — verify nemo-ci-aws-gpu-x2 runner is selected
  • Trigger CI from an external contributor PR — verify nemo-ci-aws-gpu-x2-ephemeral runner is selected

Signed-off-by: oliver könig <okoenig@nvidia.com>
Replace nemoci.azurecr.io with 766267172432.dkr.ecr.us-east-1.amazonaws.com.
Remove all Azure CLI install, login, and ACR login steps from build-container
and test-template actions. Drop environment: nemo-ci (Azure-backed) from all
jobs. Route the CPU unit-test job from linux-amd64-cpu16 to the AWS runner
selected by is-not-external-contributor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g ko3n1g changed the title [ci] feat: use AWS ephemeral runners for external contributors ci(feat): use AWS ephemeral runners for external contributors Apr 17, 2026
@ko3n1g ko3n1g enabled auto-merge (squash) April 17, 2026 23:35
Line numbers shifted due to new is-not-external-contributor job;
regenerated baseline with detect-secrets scan.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
@chtruong814
Copy link
Copy Markdown
Contributor

Please take a look at the feedback for the ephemeral runners on MBridge. It should all apply here as well.
NVIDIA-NeMo/Megatron-Bridge#3370 (review)

Remove the `is-not-external-contributor` job and the copied
`check-nvidia-sso-membership` action — the pre-flight workflow already
handles SSO membership checks and exposes `runner_prefix` as an output
for exactly this purpose.

Pass `nemo-ci-aws-gpu-x2` / `nemo-ci-aws-gpu-x2-ephemeral` as the
`default_runner_prefix` / `non_nvidia_runner_prefix` inputs to
pre-flight and use `needs.pre-flight.outputs.runner_prefix` in all GPU
jobs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants