Skip to content

batch_sampler.batch_size is None with deepspeed and DataLoader(batch_size=None) #19460

Open
@olegsinavski

Description

@olegsinavski

Bug description

Hello,

After Lightning 2.2.0 upgrade we experience a crash when using deepspeed with DataLoader(batch_size=None, ...):

  File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/trainer/trainer.py", line 579, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/trainer/trainer.py", line 962, in _run
    self.strategy.setup(self)
  File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/strategies/deepspeed.py", line 335, in setup
    self._init_config_if_needed()
  File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/strategies/deepspeed.py", line 804, in _init_config_if_needed
    self._format_config()
  File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/strategies/deepspeed.py", line 813, in _format_config
    self._format_batch_size_and_grad_accum_config()
  File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/strategies/deepspeed.py", line 906, in _format_batch_size_and_grad_accum_config
    batch_size = self._auto_select_batch_size()
  File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/strategies/deepspeed.py", line 920, in _auto_select_batch_size
    batch_size = train_dataloader.batch_sampler.batch_size
AttributeError: 'NoneType' object has no attribute 'batch_size'

This was working on 2.1 versions

What version are you seeing the problem on?

v2.2

How to reproduce the bug

class DataModule(LightningDataModule):
    def train_dataloader(self) -> DataLoader:
        # Don't set batch size here, it's done in the datapipe
        return DataLoader(
            self.train_dp, batch_size=None,...
        )

trainer = Trainer(strategy='deepspeed', ...)
trainer.fit(model, DataModule())

Error messages and logs

  File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/trainer/trainer.py", line 579, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/trainer/trainer.py", line 962, in _run
    self.strategy.setup(self)
  File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/strategies/deepspeed.py", line 335, in setup
    self._init_config_if_needed()
  File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/strategies/deepspeed.py", line 804, in _init_config_if_needed
    self._format_config()
  File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/strategies/deepspeed.py", line 813, in _format_config
    self._format_batch_size_and_grad_accum_config()
  File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/strategies/deepspeed.py", line 906, in _format_batch_size_and_grad_accum_config
    batch_size = self._auto_select_batch_size()
  File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/strategies/deepspeed.py", line 920, in _auto_select_batch_size
    batch_size = train_dataloader.batch_sampler.batch_size
AttributeError: 'NoneType' object has no attribute 'batch_size'

Environment

Unfortunately, I can't collect the environment because we use a custom build system :(

lightning==2.2.0
torch==2.1.2

More info

No response

cc @awaelchli

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions