Skip to content

[CI] Re-enable PyTorch nightly build and test in nightly CI/CD #310

Open
atalman wants to merge 5 commits intomainfrom
atalman/nightly-wip
Open

[CI] Re-enable PyTorch nightly build and test in nightly CI/CD #310
atalman wants to merge 5 commits intomainfrom
atalman/nightly-wip

Conversation

@atalman
Copy link
Copy Markdown
Collaborator

@atalman atalman commented Mar 17, 2026

Summary

Changes

buildkite_step.py

  • Import get_torch_nightly_image from docker utilities
  • Inject $IMAGE_TAG_TORCH_NIGHTLY variable into pipeline steps
  • Collect steps marked with torch_nightly: true into a dedicated group
  • New _create_torch_nightly_group(): creates a "vLLM Against PyTorch Nightly" group with an optional manual block step, a
    Docker image build step (using .buildkite/image_build/image_build_torch_nightly.sh), and individual test steps using the
    nightly image
  • New _get_nightly_step_plugin(): determines the correct plugin configuration based on device type

step.py

  • Add torch_nightly: Optional[bool] = False field to the Step model, allowing test area YAML files to opt steps into the
    nightly pipeline

docker_utils.py

  • New get_torch_nightly_image(): returns the appropriate nightly Docker image tag based on branch (main vs. premerge)

test-template-ci.j2

  • Auto-run nightly builds on main branch (block step only on non-main, non-nightly branches)
  • Update CUDA architecture lists to include 12.0 and 12.0a

Test plan

@atalman atalman changed the title Atalman/nightly wip Re-Enable torch nightly build and test in nightly CI/CD Mar 17, 2026
@atalman atalman changed the title Re-Enable torch nightly build and test in nightly CI/CD [CI] Re-enable PyTorch nightly build and test in nightly CI/CD Mar 19, 2026
@khluu
Copy link
Copy Markdown
Collaborator

khluu commented Mar 27, 2026

If it's essentially mirroring steps and run the same commands just on a different image, can we leverage the mirror field, same as AMD?

@atalman atalman force-pushed the atalman/nightly-wip branch from 7b96dbb to c140e70 Compare March 27, 2026 17:47
@atalman
Copy link
Copy Markdown
Collaborator Author

atalman commented Mar 27, 2026

Hi @khluu

Summary of changes:

buildkite/pipeline_generator/step.py

  • Kept torch_nightly: Optional[bool] = False for backward compatibility
  • Added a normalize_torch_nightly_to_mirror pydantic validator that promotes torch_nightly:
    true into mirror: {torch_nightly: {}}, so downstream code only checks one place

buildkite/pipeline_generator/buildkite_step.py

  • Changed nightly collection check from step.torch_nightly to step.mirror and
    step.mirror.get("torch_nightly") is not None (works because the validator normalizes the
    legacy field)

buildkite/test-template-ci.j2

  • Changed Jinja2 check to step.torch_nightly or (step.mirror and step.mirror.torch_nightly
    is defined) — needs both since Jinja2 doesn't go through the pydantic validator

Result

  • Existing vllm YAML files with torch_nightly: true keep working
  • New steps can use the mirror pattern per Kevin's suggestion: mirror: { torch_nightly: {} }
  • Consistent with how AMD mirroring works via the mirror field

atalman added 5 commits March 27, 2026 11:09
Signed-off-by: atalman <atalman@fb.com>
Signed-off-by: atalman <atalman@fb.com>
Signed-off-by: atalman <atalman@fb.com>
Signed-off-by: atalman <atalman@fb.com>
Signed-off-by: atalman <atalman@fb.com>
@atalman atalman force-pushed the atalman/nightly-wip branch from c140e70 to 5ef28a0 Compare March 27, 2026 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants