-
Notifications
You must be signed in to change notification settings - Fork 539
Fix total_batch_size logging for sequence parallelism #1542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+3
−1
Merged
Changes from all commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
ed98e2f
Fix wandb tracker AttributeError on non-main processes in SFT
hamishivi eeac0af
Fix UnboundLocalError for beaker_config in tracking setup
hamishivi 0491264
Apply ruff formatting
hamishivi 81eb363
Update changelog with SFT multi-node fixes
hamishivi 15df712
Sync SP changes: direct imports, updated help text, SP test script
hamishivi 96f1a7b
Inline attn_impl variable
hamishivi 7d3df06
Add PR link to changelog entries
hamishivi 9728cd7
Fix import sorting: merge accelerate.utils imports
hamishivi 4049841
Require flash attention for sequence parallelism
hamishivi c0f7d76
Fix another unbound beaker_config reference at end of training
hamishivi 84802f8
Skip SP setup during dataset caching
hamishivi caf67a1
Set dp_shard_size for ParallelismConfig to match world size
hamishivi c2fd078
Handle UlyssesSPDataLoaderAdapter missing set_epoch
hamishivi dacdc46
Unwrap SP dataloader adapter for set_epoch
hamishivi 4ff4c0a
Add comment explaining SP dataloader unwrap
hamishivi dcc9592
Set multinode SFT test to urgent priority
hamishivi c1fe3ed
Fix SP dataloader unwrap: attribute is .dl not .dataloader
hamishivi 33c04db
Filter non-2D tensors from batch for SP dataloader compatibility
hamishivi c669383
Log which batch keys are dropped by SP collator filter
hamishivi fe1cc31
Root cause: 1D index column from dataset cache breaks SP adapter
hamishivi a51cf2a
Pad batch seq length to be divisible by SP size
hamishivi 0f67778
Handle shift_labels key from UlyssesSPDataLoaderAdapter
hamishivi f05e7fa
Move SP batch tensors to device (adapter returns CPU tensors)
hamishivi 1bc7044
Always move batch to device (no-op when already there)
hamishivi c0f7afd
Remove redundant comments
hamishivi 1ab980c
Rename shift_labels back to labels before model forward pass
hamishivi 497243a
Recreate LR scheduler after prepare when using SP
hamishivi 955ae24
Use getattr for set_epoch to avoid SP branch
hamishivi dcfe350
Just pop index column instead of filtering all non-2D tensors
hamishivi 1c41439
Always create LR scheduler after prepare for correct step count
hamishivi 5859150
Rename shift_labels early, remove duplicate rename before forward
hamishivi 42b9844
Restore original LR scheduler for non-SP, fix SP scheduler wrapping
hamishivi 575b724
Fix SP scheduler: use post-prepare max_train_steps without num_proces…
hamishivi 9e79b1f
Fix SP scheduler: account for micro-batch stepping with grad_accum mu…
hamishivi c02f0c3
Merge main, resolve changelog conflict
hamishivi 6b08182
Validate world_size divisible by sequence_parallel_size
hamishivi 91655d1
Extract _create_scheduler helper to deduplicate scheduler creation
hamishivi ec32cbc
Fix total_batch_size logging to account for sequence parallelism
hamishivi aca5e1f
Merge remote-tracking branch 'origin/main' into hamishivi/fix-sft-sp-…
hamishivi 2afde3a
Add changelog entry for batch size logging fix
hamishivi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you ofc but I think we should refactor this somewhere so we can share it across DPO, GRPO, SFT