Open
Description
Bug description
Hi,
I use fabric with deepspeed zero-3 strategy to shard model among 2 gpus, and get Model params = 0.0 M
of model size when using with fabric.sharded_model() context.
import lightning as L
fabric = L.Fabric(accelerator="cuda", strategy='deepspeed_stage_3', precision='bf16-mixed')
fabric.launch()
with fabric.sharded_model():
net = mymodel()
num_params = sum([param.nelement() for param in net.parameters()])
fabric.print('Model params = %2.1f M' % (num_params / 1000**2))
Without the fabric.sharded_model() context, I get the correct model size as Model params = 13.6 M
.
How to solve this issue? Thanks.
What version are you seeing the problem on?
v2.0
How to reproduce the bug
No response
Error messages and logs
# Error messages and logs here please
Environment
Current environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 2.0):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):
More info
No response