CUDAAccelerator.setup_device(device) initializes a different device (GPU 0)

### Bug description

# Problem

Despite passing values other than 0 as `devices` to `Trainer`, memory was allocated on device 0.

Passing a list of specific devices to `pytorch-lightning` ideally means that other devices are not used in any way.

# Investigation

`CUDAAccelerator.setup_device(device)` calls `_check_cuda_matmul_precision` before `torch.cuda.set_device(device)`.

`_check_cuda_matmul_precision` calls `torch.cuda._lazy_init` via
- `_check_cuda_matmul_precision`
- `_is_ampere_or_later`
- `torch.cuda.get_device_capability`
- `torch.cuda.get_device_properties`
- `torch.cuda._lazy_init`

# Is this a bug

It was not clear to me what the contract is regarding using devices not explicitly passed, but also not excluded via `CUDA_VISIBLE_DEVICES`.

A similar issue in `pytorch` convinced me this is a bug https://github.com/pytorch/pytorch/issues/149119

I also considered whether this was a pytorch-lightning bug or an upstream pytorch bug. I read `torch.cuda` code and it seems to me that  it is up to the caller to first call `torch.cuda.set_device` before other functions that require CUDA context.

# How to fix it

It seems to me that it's just a matter of reordering two lines, so that `torch.cuda.set_device(device)` is before `_check_cuda_matmul_precision`, which seems to be harmless: `_check_cuda_matmul_precision` is just for logging info for some devices.

I will submit a PR for this shortly.

### What version are you seeing the problem on?

v2.6, master

### Reproduced in studio

_No response_

### How to reproduce the bug

```python
Train with a trainer like


trainer = pytorch_lightning.Trainer(
    devices=[3,4,5,6],
    accelerator="gpu",
    strategy: "ddp_find_unused_parameters_true",
)


and check memory allocation via `nvidia-smi`
```

### Error messages and logs

```
# Error messages and logs here please
```


### Environment

<details>
  <summary>Current environment</summary>

```
# - PyTorch Lightning Version 2.6.1
# - PyTorch Version 2.6.0
# - Python 3.10
# - Ubuntu
# - CUDA 12.4 /cuDNN 9
# 8 GPUs Nvidia A40
#  uv installation
```

</details>


### More info

_No response_

cc @ethanwharris

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDAAccelerator.setup_device(device) initializes a different device (GPU 0) #21725

Bug description

Problem

Investigation

Is this a bug

How to fix it

What version are you seeing the problem on?

Reproduced in studio

How to reproduce the bug

Error messages and logs

Environment

More info

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

CUDAAccelerator.setup_device(device) initializes a different device (GPU 0) #21725

Description

Bug description

Problem

Investigation

Is this a bug

How to fix it

What version are you seeing the problem on?

Reproduced in studio

How to reproduce the bug

Error messages and logs

Environment

More info

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions