[bug]: vram does not seem to be freed when switching models in v5.6.0rc2

### Is there an existing issue for this problem?

- [X] I have searched the existing issues

### Operating system

Linux

### GPU vendor

Nvidia (CUDA)

### GPU model

RTX 3090

### GPU VRAM

24GB

### Version number

v5.6.0rc2

### Browser

The one from the Launcher. System has Firefox 134.0 

### Python dependencies

{
  "accelerate": "1.0.1",
  "compel": "2.0.2",
  "cuda": "12.1",
  "diffusers": "0.31.0",
  "numpy": "1.26.4",
  "opencv": "4.9.0.80",
  "onnx": "1.16.1",
  "pillow": "11.1.0",
  "python": "3.11.11",
  "torch": "2.4.1+cu121",
  "torchvision": "0.19.1",
  "transformers": "4.46.3",
  "xformers": null
}

### What happened

When generating an image with a model and switching models it seems vram is not freed from the previous model. Eventually when switching models the vram fills up and an OOM error pops up.

### What you expected to happen

VRAM to be freed to allow loading of the new model

### How to reproduce the problem

Switch models.

### Additional context

In case it is relevant, the invokeai.yaml file:

# Internal metadata - do not edit:
schema_version: 4.0.2

# Put user settings here - see https://invoke-ai.github.io/InvokeAI/configuration/:
enable_partial_loading: true
device: cuda:1

### Discord username

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug]: vram does not seem to be freed when switching models in v5.6.0rc2 #7556

Is there an existing issue for this problem?

Operating system

GPU vendor

GPU model

GPU VRAM

Version number

Browser

Python dependencies

What happened

What you expected to happen

How to reproduce the problem

Additional context

Internal metadata - do not edit:

Put user settings here - see https://invoke-ai.github.io/InvokeAI/configuration/:

Discord username

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development