GRPO refactoring #2530

mydatascience · 2025-10-21T18:53:11Z

Description

Refactoring how to run GRPO with Tunix MaxText vLLM and Pathways (TMVP). We are adding the new files under src/MaxText/rl and the config file rl.yml in src/MaxText/configs. The entry level main file is src/MaxText/rl/train_rl.py
Work was started by @mydatascience and completed by @A9isha

Tests

I have tested the code on Llama3.1-8b and Llama3.1-70b on v7x-128, v5p-64

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

.github/workflows/RunTests.yml

src/MaxText/examples/GRPO_README.md

src/MaxText/examples/README.md

src/MaxText/experimental/rl/grpo_tunix_trainer.py

github-actions · 2025-10-27T21:43:17Z

🤖 Hi @A9isha, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

src/MaxText/examples/grpo_runner.py

src/MaxText/rl/evaluate_rl.py

src/MaxText/rl/train_rl.py

src/MaxText/train_rl.py

src/MaxText/rl/train_rl.py

xuefgu

Thanks for the PR @A9isha !

In addition to the comments - can you please clarify what tests you have performed, i.e. the hardware, model, and more importantly with what configs (since that's the main change here)?

src/MaxText/rl/evaluate_rl.py

maxtext_grpo_dependencies.Dockerfile

src/MaxText/rl/train_rl.py

src/MaxText/train_rl.py

src/MaxText/rl/train_rl.py

src/MaxText/train_rl.py

docker_build_dependency_image.sh

src/MaxText/configs/base.yml

src/MaxText/configs/rl.yml

src/MaxText/rl/train_rl.py

src/MaxText/rl/evaluate_rl.py

src/MaxText/examples/README_how_to_run_examples.md

src/MaxText/examples/local_installation.sh

src/MaxText/rl/train_rl.py

src/MaxText/examples/local_installation.sh

maxtext_grpo_dependencies.Dockerfile

src/MaxText/examples/local_installation.sh

maxtext_grpo_dependencies.Dockerfile

docker_upload_runner.sh

src/MaxText/examples/README_how_to_run_examples.md

A9isha

Dummy approval since an owner approval is needed :D

PiperOrigin-RevId: 828688487 clean up

mydatascience requested review from A9isha, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, jacoguzo, jiangjy1982, khatwanimohit, parambole, richjames0, shralex, suexu1025 and vipannalla as code owners October 21, 2025 18:53

github-advanced-security bot found potential problems Oct 21, 2025

View reviewed changes

.github/workflows/RunTests.yml Fixed Show fixed Hide fixed

kyle-meggs reviewed Oct 23, 2025

View reviewed changes

src/MaxText/examples/GRPO_README.md Outdated Show resolved Hide resolved

kyle-meggs reviewed Oct 23, 2025

View reviewed changes

src/MaxText/examples/GRPO_README.md Outdated Show resolved Hide resolved

mydatascience force-pushed the universal_grpo branch from 149a3dc to e72906d Compare October 24, 2025 17:15

mydatascience requested a review from xuefgu as a code owner October 24, 2025 17:15

A9isha reviewed Oct 24, 2025

View reviewed changes

src/MaxText/examples/README.md Outdated Show resolved Hide resolved

A9isha reviewed Oct 24, 2025

View reviewed changes

src/MaxText/examples/README.md Outdated Show resolved Hide resolved

A9isha reviewed Oct 24, 2025

View reviewed changes

src/MaxText/experimental/rl/grpo_tunix_trainer.py Outdated Show resolved Hide resolved

A9isha added the gemini-review label Oct 27, 2025

SurbhiJainUSC reviewed Oct 28, 2025

View reviewed changes

src/MaxText/examples/grpo_runner.py Outdated Show resolved Hide resolved

A9isha force-pushed the universal_grpo branch from 8453278 to e356292 Compare October 28, 2025 17:18