Open
Description
Bug description
Hello,
After Lightning 2.2.0 upgrade we experience a crash when using deepspeed
with DataLoader(batch_size=None, ...)
:
File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/trainer/trainer.py", line 579, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/trainer/trainer.py", line 962, in _run
self.strategy.setup(self)
File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/strategies/deepspeed.py", line 335, in setup
self._init_config_if_needed()
File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/strategies/deepspeed.py", line 804, in _init_config_if_needed
self._format_config()
File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/strategies/deepspeed.py", line 813, in _format_config
self._format_batch_size_and_grad_accum_config()
File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/strategies/deepspeed.py", line 906, in _format_batch_size_and_grad_accum_config
batch_size = self._auto_select_batch_size()
File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/strategies/deepspeed.py", line 920, in _auto_select_batch_size
batch_size = train_dataloader.batch_sampler.batch_size
AttributeError: 'NoneType' object has no attribute 'batch_size'
This was working on 2.1 versions
What version are you seeing the problem on?
v2.2
How to reproduce the bug
class DataModule(LightningDataModule):
def train_dataloader(self) -> DataLoader:
# Don't set batch size here, it's done in the datapipe
return DataLoader(
self.train_dp, batch_size=None,...
)
trainer = Trainer(strategy='deepspeed', ...)
trainer.fit(model, DataModule())
Error messages and logs
File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/trainer/trainer.py", line 579, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/trainer/trainer.py", line 962, in _run
self.strategy.setup(self)
File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/strategies/deepspeed.py", line 335, in setup
self._init_config_if_needed()
File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/strategies/deepspeed.py", line 804, in _init_config_if_needed
self._format_config()
File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/strategies/deepspeed.py", line 813, in _format_config
self._format_batch_size_and_grad_accum_config()
File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/strategies/deepspeed.py", line 906, in _format_batch_size_and_grad_accum_config
batch_size = self._auto_select_batch_size()
File ".../pip-ai-experimental_pytorch_lightning/site-packages/pytorch_lightning/strategies/deepspeed.py", line 920, in _auto_select_batch_size
batch_size = train_dataloader.batch_sampler.batch_size
AttributeError: 'NoneType' object has no attribute 'batch_size'
Environment
Unfortunately, I can't collect the environment because we use a custom build system :(
lightning==2.2.0
torch==2.1.2
More info
No response
cc @awaelchli