Skip to content

GEA training improvements: DDP batch size, sampler epoch, smooth head#575

Open
pjreddie wants to merge 1 commit intomasterfrom
gea-training-improvements
Open

GEA training improvements: DDP batch size, sampler epoch, smooth head#575
pjreddie wants to merge 1 commit intomasterfrom
gea-training-improvements

Conversation

@pjreddie
Copy link
Copy Markdown
Contributor

Summary

Three small fixes that came up during GEA ecosystem segmentation finetuning:

  • data_module: batch_size config now means global batch size — automatically divided by world_size in multi-GPU training so the config means the same thing regardless of GPU count
  • lightning_module: Fix distributed sampler epoch shuffling — also call set_epoch on the sampler (not just batch_sampler) so data shuffling varies each epoch in DDP
  • segmentation: Add smooth_sigma option to SegmentationHead — applies differentiable Gaussian blur to logits before loss/softmax (used in GEA smooth head experiments)

Test plan

  • Used in GEA hyperparameter search experiments (p6_smooth_unfreeze, p8_smooth_train, p9_smooth_unfreeze)
  • Multi-GPU training verified with correct per-GPU batch sizes

🤖 Generated with Claude Code

- data_module: treat batch_size as global and divide by world_size in
  multi-GPU training so config means the same thing regardless of GPU count
- lightning_module: also call set_epoch on the sampler (not just
  batch_sampler) so shuffling varies each epoch in distributed training
- segmentation: add smooth_sigma option to SegmentationHead for
  differentiable Gaussian blur on logits before loss computation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
path: the dataset path
path_options: additional options for path to pass to fsspec.
batch_size: the batch size
batch_size: the total batch size across all GPUs. In multi-GPU
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we often set batch_size based on available GPU memory. I don't think the existing option should be changed in behavior; if desired you could deprecate the existing one and add per_gpu_batch_size and global_batch_size options to replace it, and then it should raise error if neither is set or if both are set.

favyen2
favyen2 previously approved these changes Mar 30, 2026
Copy link
Copy Markdown
Collaborator

@favyen2 favyen2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The segmentation and train_dataloader.sampler changes look good to me but I think the batch_size behavior should either remain the same or deprecate it and add local_batch_size + global_batch_size options (and deprecated batch_size option should set the local batch size).

@favyen2 favyen2 dismissed their stale review March 30, 2026 17:41

meant to comment not approve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants