[release][train] Adding py3.13 ray-ml image with torchft-nightly by elliot-barn · Pull Request #63587 · ray-project/ray

elliot-barn · 2026-05-21T23:45:51Z

creating a ray-ml py3.13 release test image with torchft-nightly

Creating a python 3.13 variation of training_ingest_benchmark-task=image_classification for full_training.jpeg and full_training.s3_url

release test run: https://buildkite.com/ray-project/release/builds/93976

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

gemini-code-assist

Code Review

This pull request introduces support for Python 3.13 across the build and release infrastructure, including updates to Buildkite configurations, dependency lock files, and BYOD requirements. It also adds a new suite of nightly training ingest benchmarks for Python 3.13. Feedback was provided regarding a potential typo in a configuration flag, redundant dependency declarations in the new requirements file, and inconsistent argument formatting in the release test definitions.

I am having trouble creating individual review comments. Click here to see my feedback.

release/release_tests.yaml (1967)

anyscale_sdk_2026: true appears to be a typo. This flag is typically anyscale_sdk_v2: true in Ray release tests. Please verify if this is the intended key.

    anyscale_sdk_v2: true

release/ray_release/byod/requirements_ml_byod_3.13.in (43-44)

Both torchft==0.1.1 and torchft-nightly are listed. Since the pull request aims to include the nightly version, the stable version is redundant and may cause installation conflicts. It should be removed.

torchft-nightly

release/release_tests.yaml (2038)

The arguments --skip_train_step True and --skip_validation_at_epoch_end True use a space-separated format, which is inconsistent with the --arg=value format used in all other variations of this test (e.g., lines 1989, 2058). Using the consistent format improves maintainability and avoids potential parsing issues.

        script: RAY_TRAIN_V2_ENABLED=1 python train_benchmark.py --task=image_classification --dataloader_type=ray_data --num_workers=16 --skip_train_step=True --skip_validation_at_epoch_end=True --image_classification_data_format=s3_url

github-actions · 2026-06-05T13:22:41Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

Add a self-contained raydepsets depset (release_ml_torchft_tests.depsets.yaml) that compiles the Ray ML release-test dependencies with torchft-nightly layered on top, producing release/ray_release/byod/ml_torchft_py3.13.lock for py3.13 / cu128. It is installed onto the core Ray CUDA image via byod_ml_torchft.sh, so torchft release tests no longer depend on the published py3.13 ray-ml image (which fails to build due to dask/nixl py3.13 gaps). Decouple from the in-progress published py3.13 ray-ml image work by reverting the buildkite image/release steps, ray-images.json, the gpu BYOD py3.13 allowance, and the ml-base-extra-testdeps py3.13 depset + locks, and by removing torchft from the shared requirements_ml_byod_*.in files. torchft now lives only in the dedicated requirements_ml_torchft.in. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

…/github.com/ray-project/ray into elliot-barn-add-torchft-to-ml-release-image

…ft.txt Align the py3.13 torchft release-image depset with master after the torch 2.9.0 upgrade (#63361): - Bump requirements_ml_byod_3.13.in to torch==2.9.0 and drop the stale triton==3.3.0 pin (torch 2.9.0 pulls triton==3.5.0 transitively), matching the py3.13 constraint and ML requirement files. - Source torchft from the canonical python/requirements/ml/py313/torchft.txt (torchft-nightly==2026.5.15, torch-2.9.0-compatible) instead of a separate requirements_ml_torchft.in, so there is a single torchft pin. - Regenerate ml_torchft_py3.13.lock -> torch==2.9.0+cu128 / torchaudio 2.11.0+cu128 / triton 3.5.0; verified idempotent so raydepsets --check passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

Add a minimal reference release test showing how to run a release test on the torchft Ray ML image variant. It uses the core Ray CUDA image (py3.13) with the torchft dependency lock installed on top: cluster: anyscale_sdk_2026: true byod: type: cu123 post_build_script: byod_ml_torchft.sh python_depset: ml_torchft_py3.13.lock The workload imports torch (2.9.0) + torchft and runs a short Ray Train v2 + torchft linear training loop to prove the image works end to end. Validated against the release schema (//release:test_config). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

Setting byod.python_depset is sufficient: the BYOD image build automatically copies the lock in and runs `uv pip install --system --no-deps -r python_depset.lock` (release/ray_release/byod/build_context.py). The custom byod_ml_torchft.sh ran the identical command, so it installed the deps a second time for no reason. Remove byod_ml_torchft.sh and the post_build_script reference from the torchft_hello_world reference test; rely on python_depset alone. Validated with //release:test_config. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

elliot-barn added 2 commits May 21, 2026 19:01

creating py3.13 ml image with torchft installed

4316e78

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

allowing for 3.13 ray-ml images

d91fab2

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

elliot-barn requested a review from a team as a code owner May 21, 2026 23:45

elliot-barn requested a review from TimothySeah May 21, 2026 23:45

creating variation

07ebc7e

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

gemini-code-assist Bot reviewed May 21, 2026

View reviewed changes

ray-gardener Bot added train Ray Train Related Issue release-test release test labels May 22, 2026

github-actions Bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 5, 2026

elliot-barn and others added 6 commits June 9, 2026 15:06

Merge branch 'master' into elliot-barn-add-torchft-to-ml-release-image

9d322e5

Merge branch 'elliot-barn-add-torchft-to-ml-release-image' of https:/…

1b6ffd0

…/github.com/ray-project/ray into elliot-barn-add-torchft-to-ml-release-image

github-actions Bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[release][train] Adding py3.13 ray-ml image with torchft-nightly#63587

[release][train] Adding py3.13 ray-ml image with torchft-nightly#63587
elliot-barn wants to merge 9 commits into
masterfrom
elliot-barn-add-torchft-to-ml-release-image

elliot-barn commented May 21, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

elliot-barn commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

release/release_tests.yaml (1967)

release/ray_release/byod/requirements_ml_byod_3.13.in (43-44)

release/release_tests.yaml (2038)

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

elliot-barn commented May 21, 2026 •

edited

Loading