how to release gpu memory when use onnxruntime with fastapi

This is probably a repetitive problem, but I still haven't found how to solve it.

I used FastAPI to build API interfaces and used onnxruntime to load models. However, due to the limited number of models and gpu memory, I hope to release this part of the occupied gpu memory after each interface call is completed. But it seems that I can never fully release it, and there will still be some models occupying gpu memory.

I want to know if there is any way to solve this problem for Python.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

how to release gpu memory when use onnxruntime with fastapi #22899

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

how to release gpu memory when use onnxruntime with fastapi #22899

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions