Not able to use more than 1 GPU

### Bug description

I have a ml.p4d.24xlarge machine in AWS. I am trying to run a Temporal Fusion Transformer model. But I am not able use more than 1  GPU at a time. Anything other than devices=1  does not work.

### What version are you seeing the problem on?

v2.4

### How to reproduce the bug

```python
import lightning.pytorch as pl

# configure network and trainer
early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=10, verbose=False, mode="min")
lr_logger = LearningRateMonitor()  # log the learning rate
logger = TensorBoardLogger("lightning_logs")  # logging results to a tensorboard

trainer = pl.Trainer(
    max_epochs=5,
    accelerator="gpu",
    devices=2,
    enable_model_summary=True,
    gradient_clip_val=0.1,#0.1321938983226982, #0.1,
    # limit_train_batches=50,  # comment in for training, running validation every 30 batches
    # fast_dev_run=True,  # comment in to check that networkor dataset has no serious bugs
    callbacks=[lr_logger, early_stop_callback],
    logger=logger,
)
quantile_loss = QuantileLoss(quantiles=[0.5])
tft = TemporalFusionTransformer.from_dataset(
    training,
    learning_rate= 0.0004,#0.00037883813052639795, 0.0001,
    hidden_size=128,
    attention_head_size=4,
    dropout= 0.1, #0.27511071120990627, #,
    hidden_continuous_size=32, #32,
    # loss=QuantileLoss(),
    # output_size=[1],  # there are 7 quantiles by default: [0.02, 0.1, 0.25, 0.5, 0.75, 0.9, 0.98]
    # 5, [1,1,1,1,1]
    # loss=MAE(),
    loss=quantile_loss,
    # log_interval=10,  # uncomment for learning rate finder and otherwise, e.g. to 10 for logging every 10 batches
    optimizer="AdamW",
    reduce_on_plateau_patience=4,
)
print(f"Number of parameters in network: {tft.size() / 1e3:.1f}k")
```
```


### Error messages and logs

```
# Error messages and logs here please
```
RuntimeError: Lightning can't create new processes if CUDA is already initialized. Did you manually call `torch.cuda.*` functions, have moved the model to the device, or allocated memory on the GPU any other way? Please remove any such calls, or change the selected strategy. You will have to restart the Python kernel.

### Environment

Platform            AWS
GPU                ml.p4d.24xlarge
Python              3.11.10
pytorch-forecasting                1.2.0
lightning                          2.4.0
lightning-utilities                0.11.9
pytorch-lightning                  2.4.0
pytorch_optimizer                  3.3.0
pytorch-ranger                     0.1.1
s3torchconnector                   1.2.6
s3torchconnectorclient             1.2.7
sagemaker_pytorch_training         2.8.1
tft-torch                          0.0.6
torch                              2.5.1+cu124
torchaudio                         2.5.1+cu124
torchmetrics                       1.6.0
torchtext                          0.18.0+cu124
torchtnt                           0.2.4
torchvision                        0.20.1+cu124

### More info

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Not able to use more than 1 GPU #20520

Bug description

What version are you seeing the problem on?

How to reproduce the bug

Error messages and logs here please

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Not able to use more than 1 GPU #20520

Description

Bug description

What version are you seeing the problem on?

How to reproduce the bug

Error messages and logs here please

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions