-
Notifications
You must be signed in to change notification settings - Fork 567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R1-Style distributed GRPO #2326
Merged
+2,425
−2
Merged
Changes from 1 commit
Commits
Show all changes
117 commits
Select commit
Hold shift + click to select a range
1c29d67
RL starter code
RedTachyon ed544da
Add gsm8k
RedTachyon 6ca2c38
Notebooks
RedTachyon 7e39b69
Distributed dev progress
RedTachyon 55b1c65
Decent progress on R1 RL
RedTachyon aa34954
PoC training loop, but not working - code checkpoint
RedTachyon 0cc7795
8B recipe, sorta running?
RedTachyon b11e742
Merge branch 'main' into grpo
RedTachyon ce7e8ce
Merge pull request #1 from RedTachyon/grpo
RedTachyon 736b14f
Some updates, some progress
RedTachyon eb3d5b9
Multi-node training, new reward shaping, success metric
RedTachyon 9df4b38
Synchronize metrics, fix parsing, more memory management
RedTachyon 3d39ffc
Sync rewards and successes between processes
RedTachyon 30fa65c
Add filter kwargs and partition for easier dataset filtering
RedTachyon e8f19f2
Merge pull request #2 from RedTachyon/grpo
RedTachyon 499c013
Reorganize methods
RedTachyon 3ca4f49
Batched logit to logprob conversion
RedTachyon 300c117
Remove old notes
RedTachyon 3cf169c
Revert config changes
RedTachyon 7d9d37a
More config cleanup
RedTachyon 654eb56
Recipe cleanup
RedTachyon c75cb7a
Remove unnecessary PPO change
RedTachyon 1f6be85
Cleanup
RedTachyon cf770d3
Merge branch 'pytorch:main' into main
RedTachyon 1fd86c8
Remove redundant code
RedTachyon e198825
More redundant code
RedTachyon 474c8dc
Reorganize recipes
RedTachyon 60d7cd9
Some cleanup of RL dataset and GSM8k
RedTachyon 8d018ba
Pre-commit cleanup
RedTachyon c9a01cb
Remove MATH dataset for now
RedTachyon c17bfc0
Properly remove MATH dataset
RedTachyon b312af8
Docstrings
RedTachyon f7c0929
Reorganize some recipes, add sbatch for SFT
RedTachyon f6c7e53
Remove unused 8B configs, add another sbatch
RedTachyon 72cb4bd
Recipes leanup
RedTachyon a9112b8
GRPO recipe cleanup
RedTachyon b089ff2
Final MVP bugfixes
RedTachyon 4870213
Remove old unused test
RedTachyon 2dd6f60
Pre-commit
RedTachyon 7b58712
Stop token handling for both single and multi device
RedTachyon b010ef6
Update recipes/configs/dev/grpo/3B_full_rl.yaml
RedTachyon 603d16a
Remove redundant comment
RedTachyon d7269e4
Delete recipes/configs/llama3_2/3B_full_rl_single_device_mps.yaml
RedTachyon 9978a44
Fix function arguments in optimizer setup
RedTachyon d710200
Update recipes/configs/dev/grpo/3B_full_rl.yaml
RedTachyon 352c4fb
Update torchtune/rlhf/loss/grpo.py
RedTachyon 8f60178
PPO -> GRPO
RedTachyon bb8b97d
RL -> GRPO (recipe
RedTachyon 44aeb85
Make sure we're not training with float16
RedTachyon 86de099
Remove chunked loss logic
RedTachyon 120b1d2
Save rank and world_size in the recipe
RedTachyon f3e9b77
Rename R1Trajectory to GRPOTrajectory, other cleanup
RedTachyon b856695
| None -> Optional
RedTachyon e61894b
Fix `return_logits==False` edge case
RedTachyon 96dc9c9
Undo an accidental change in generation
RedTachyon b1cfc3b
Update generate_trajectory docstring
RedTachyon 8a62d1e
Reenable reference network
RedTachyon 7394cc7
Remove optimizer_in_bwd
RedTachyon 24cd238
Remove activation offloading
RedTachyon 67458f2
(docstring) Reward model -> reward function
RedTachyon d9c37b8
Move reward function to a new file
RedTachyon 18baeae
Remove dead code
RedTachyon 646a2ce
Remove redundant logging
RedTachyon f641161
Remove question from the reward function
RedTachyon 9f4fb88
Update recipes/dev/grpo_full_finetune_distributed.py
RedTachyon 123883a
_grpo_step -> grpo_step
RedTachyon d9cf499
Remove breakpoints and comments
RedTachyon 98df9a7
Remove breakpoints and comments
RedTachyon b068132
Docstring for the reward function
RedTachyon 738dc2d
Handle reference checkpoint separately
RedTachyon 759ff8b
Remove mentions of activation offloading
RedTachyon 239d382
Fix messed up loss
RedTachyon 957671c
Fix messed up loss, barriers to keep things in sync
RedTachyon 357aa43
Delete max_steps_per_epoch and gradient accumulation, simplify inner …
RedTachyon 3c54d01
Pre-commit
RedTachyon 0d77799
Reorganize recipes
RedTachyon 0f8df07
Remove dead settings from the GRPO config
RedTachyon 184aeef
Use DiskLogger in GRPO
RedTachyon 3018eeb
Use DiskLogger in SFT for GRPO
RedTachyon d29c455
Recipe name in logging
RedTachyon b9a56b9
Remove redundant logging
RedTachyon b143b00
Cleaned up official configs
RedTachyon 05cf10d
Merge branch 'main' into main
RedTachyon be4d96d
Docstrings for GRPO types
RedTachyon 1303695
Merge remote-tracking branch 'arielpublic/main'
RedTachyon 9d5bfa5
Fix checkpointing
RedTachyon 6cbb10a
Update recipes/dev/grpo_full_finetune_distributed.py
RedTachyon 97686e2
Update recipes/dev/grpo_full_finetune_distributed.py
RedTachyon aeb69cf
Update recipes/dev/grpo_full_finetune_distributed.py
RedTachyon feeb042
Remove mention of ac
RedTachyon e7cb937
Pre-commit
RedTachyon 2a75224
Revert generation changes, pre-commit
RedTachyon 5010f1d
Fix RL collate function
RedTachyon 711ee6b
Remove resharding from config
RedTachyon aad81fb
| -> Optional in rl dataset
RedTachyon 53cd129
| -> Optional in GSM8k dataset
RedTachyon 4dbe231
Merge branch 'pytorch:main' into main
RedTachyon 3a8c6de
Experimental stuff
RedTachyon 9677df1
Additional experimental stuff
RedTachyon 9340f89
Move experimental code to /dev
RedTachyon 53ee98f
Properly move experimental code to /dev
RedTachyon 682ebef
Remove recursive_reshard from the public API
RedTachyon 8dd7546
Separate optimized generation function, small fixes
RedTachyon 0939c94
Undo generation changes in the main function
RedTachyon 81a7765
Fix custom generate
RedTachyon e41a520
Pre-commit
RedTachyon 45084e0
Fix SFT dataset path
RedTachyon 8283230
Merge branch 'pytorch:main' into main
RedTachyon 987e971
Merge branch 'experiments' into main
RedTachyon 7a03139
Revert "Merge branch 'experiments' into main"
RedTachyon 3fc43d7
Update recipes/configs/dev/3B_full_grpo.yaml
RedTachyon dde5fd8
Update recipes/configs/dev/3B_sft_for_grpo.yaml
RedTachyon 2fddf9c
Update recipes/configs/dev/3B_full_grpo.yaml
RedTachyon aead54e
Remove redundant async checkpointing code
RedTachyon 16ad525
Remove some redundant clones
RedTachyon 70886b2
Add a generation comment
RedTachyon 2ba4a97
Pre-commit
RedTachyon File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Remove resharding from config
commit 711ee6b8ecc5e36b0839c9a81a745194f1e40d48
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does DDP not supported?