GEA training improvements: DDP batch size, sampler epoch, smooth head#575
Open
GEA training improvements: DDP batch size, sampler epoch, smooth head#575
Conversation
- data_module: treat batch_size as global and divide by world_size in multi-GPU training so config means the same thing regardless of GPU count - lightning_module: also call set_epoch on the sampler (not just batch_sampler) so shuffling varies each epoch in distributed training - segmentation: add smooth_sigma option to SegmentationHead for differentiable Gaussian blur on logits before loss computation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
favyen2
reviewed
Mar 30, 2026
| path: the dataset path | ||
| path_options: additional options for path to pass to fsspec. | ||
| batch_size: the batch size | ||
| batch_size: the total batch size across all GPUs. In multi-GPU |
Collaborator
There was a problem hiding this comment.
Currently we often set batch_size based on available GPU memory. I don't think the existing option should be changed in behavior; if desired you could deprecate the existing one and add per_gpu_batch_size and global_batch_size options to replace it, and then it should raise error if neither is set or if both are set.
favyen2
previously approved these changes
Mar 30, 2026
Collaborator
favyen2
left a comment
There was a problem hiding this comment.
The segmentation and train_dataloader.sampler changes look good to me but I think the batch_size behavior should either remain the same or deprecate it and add local_batch_size + global_batch_size options (and deprecated batch_size option should set the local batch size).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three small fixes that came up during GEA ecosystem segmentation finetuning:
batch_sizeconfig now means global batch size — automatically divided byworld_sizein multi-GPU training so the config means the same thing regardless of GPU countset_epochon thesampler(not justbatch_sampler) so data shuffling varies each epoch in DDPsmooth_sigmaoption toSegmentationHead— applies differentiable Gaussian blur to logits before loss/softmax (used in GEA smooth head experiments)Test plan
🤖 Generated with Claude Code