Open
Description
Bug description
i initialized my trainer
trainer = L.Trainer(max_epochs=5,
devices=2,
strategy='ddp_notebook',
num_sanity_val_steps=0,
profiler='simple',
default_root_dir="/kaggle/working",
callbacks=[DeviceStatsMonitor(),
StochasticWeightAveraging(swa_lrs=1e-2),
#EarlyStopping(monitor='train_Loss', min_delta=0.001, patience=100, verbose=False, mode='min'),
],
enable_progress_bar=True,
enable_model_summary=True,
)
distributed is initialized for both the GPUs but only one is getting hit.
Also for validation loop the GPU are not in usage
How can I resolve the situation to use 2 GPUs and fasten my training.
What version are you seeing the problem on?
v2.4
How to reproduce the bug
No response
Error messages and logs
# Error messages and logs here please
Environment
Current environment
#- PyTorch Lightning Version (e.g., 2.4.0):
#- PyTorch Version (e.g., 2.4):
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
More info
No response