Skip to content

[Performance] Why does inference occupy so much memory? #23867

Open
@Serenagirl

Description

@Serenagirl

Describe the issue

on arm sve256, I inferenced a srgan model with onnxruntime,but found the inference process has consumed a lot of memory.
Specifically, a 1.4M onnx model inference with fp16,consumes 45.2G virt memory and 23.4G res memory;and the 2.8M onnx model inference with fp16,consumes 14G virt memory and 3335M res memory.also If I comment out "import torch" with fp16,it consumes 36G virt memory and 23.4G res memory
`

To reproduce

,Here's my code:
`

providers = ['CPUExecutionProvider']
 session_options = ort.SessionOptions()
 session_options.intra_op_num_threads = 1
session = ort.InferenceSession(model_name, providers=providers,sess_options=session_options)
    input_tensor = np.random.randn(1, 3, 540, 960).astype(np.float16)
    outputs = session.run(None, {input_name: input_tensor})

Urgency

No response

Platform

Linux

OS Version

aarch64 openeuler

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.19.0

ONNX Runtime API

Python

Architecture

ARM64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

Yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceissues related to performance regressions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions