Open
Description
Describe the issue
on arm sve256, I inferenced a srgan model with onnxruntime,but found the inference process has consumed a lot of memory.
Specifically, a 1.4M onnx model inference with fp16,consumes 45.2G virt memory and 23.4G res memory;and the 2.8M onnx model inference with fp16,consumes 14G virt memory and 3335M res memory.also If I comment out "import torch" with fp16,it consumes 36G virt memory and 23.4G res memory
`
To reproduce
,Here's my code:
`
providers = ['CPUExecutionProvider']
session_options = ort.SessionOptions()
session_options.intra_op_num_threads = 1
session = ort.InferenceSession(model_name, providers=providers,sess_options=session_options)
input_tensor = np.random.randn(1, 3, 540, 960).astype(np.float16)
outputs = session.run(None, {input_name: input_tensor})
Urgency
No response
Platform
Linux
OS Version
aarch64 openeuler
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.19.0
ONNX Runtime API
Python
Architecture
ARM64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
Yes