Skip to content

how to release gpu memory when use onnxruntime with fastapi #22899

Open
@SZ-ing

Description

@SZ-ing

This is probably a repetitive problem, but I still haven't found how to solve it.

I used FastAPI to build API interfaces and used onnxruntime to load models. However, due to the limited number of models and gpu memory, I hope to release this part of the occupied gpu memory after each interface call is completed. But it seems that I can never fully release it, and there will still be some models occupying gpu memory.

I want to know if there is any way to solve this problem for Python.

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiissues related to all other APIs: C, C++, Python, etc.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions