[Performance] Why does inference occupy so much memory?

### Describe the issue

on arm sve256, I inferenced a srgan model with onnxruntime，but found the inference process has consumed a lot of memory.
Specifically, a 1.4M onnx model inference with fp16,consumes 45.2G virt memory and 23.4G res memory;and the 2.8M onnx model inference with fp16,consumes 14G virt memory and 3335M res memory.also If I comment out "import torch" with fp16,it consumes 36G virt memory and 23.4G res memory
`

### To reproduce

,Here's my code:
`
```
providers = ['CPUExecutionProvider']
 session_options = ort.SessionOptions()
 session_options.intra_op_num_threads = 1
session = ort.InferenceSession(model_name, providers=providers,sess_options=session_options)
    input_tensor = np.random.randn(1, 3, 540, 960).astype(np.float16)
    outputs = session.run(None, {input_name: input_tensor})
```

### Urgency

_No response_

### Platform

Linux

### OS Version

aarch64 openeuler

### ONNX Runtime Installation

Released Package

### ONNX Runtime Version or Commit ID

1.19.0

### ONNX Runtime API

Python

### Architecture

ARM64

### Execution Provider

Default CPU

### Execution Provider Library Version

_No response_

### Model File

_No response_

### Is this a quantized model?

Yes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Why does inference occupy so much memory? #23867

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Performance] Why does inference occupy so much memory? #23867

Description

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions