Memory pressure safety valve #1103

bigbitbus · 2025-03-23T20:05:41Z

Description

We use MAX_ACTIVE_MODELS to manage the LRU cache when new models are loaded.
This is a static number, and not reliable when dealing with heterogenous (different sized), many models.

This PR will clean the LRU cache until at least a threshold fraction of memory is available before trying to load a new model.

Currently only works for CUDA enabled devices like GPUs

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

Testing locally currently.

Any specific deployment considerations

For example, documentation changes, usability, usage/costs, secrets, etc.

Docs

Docs updated? What were the changes:

PawelPeczek-Roboflow · 2025-03-24T11:47:58Z

inference/core/managers/decorators/fixed_size_cache.py

@@ -7,6 +7,8 @@
 from inference.core.managers.base import Model, ModelManager
 from inference.core.managers.decorators.base import ModelManagerDecorator
 from inference.core.managers.entities import ModelDescription
+from inference.core.env import MEMORY_FREE_THRESHOLD
+import torch


torch is not always installed so shouldnt be imported directly

Fixed, moved it inside the function.

PawelPeczek-Roboflow · 2025-03-24T11:48:34Z

inference/core/managers/decorators/fixed_size_cache.py

@@ -141,3 +143,15 @@ def _resolve_queue_id(
        self, model_id: str, model_id_alias: Optional[str] = None
    ) -> str:
        return model_id if model_id_alias is None else model_id_alias
+
+    def memory_pressure_detected(self) -> bool:


I would feature flag this and import torch locally here

not sure how this would work in multi-gpu env -but probably we can wait until someone actually uses it like that (seems like we are probing default device here)

Fixed this.

(the value of MEMORY_FREE_THRESHOLD = 0 by default, and so the memory pressure is not checked by default.

…n-cuda-memory-free-below-threshold

bigbitbus · 2025-03-25T15:42:05Z

Thanks for the review @PawelPeczek-Roboflow

I now import torch within the memory pressure checking function

The memory pressure check only occurs when the environment variable MEMORY_FREE_THRESHOLD is non-zero (0 is the default value and so this will not trigger)

I tested the functionality by building gpu image locally and testing that the cuda memory is reported and the memory
pressure function works as expected (returns true when high memory pressure occurs, triggering model LRU cache eviction)

I ran the integration tests here: https://github.com/roboflow/inference/actions/runs/14063849882

…w-threshold

Memory pressure safety valve

828997a

bigbitbus requested review from PawelPeczek-Roboflow, grzegorz-roboflow, yeldarby, probicheaux and hansent as code owners March 23, 2025 20:05

bigbitbus marked this pull request as draft March 23, 2025 20:07

bigbitbus added 2 commits March 24, 2025 07:38

Fix formula

0a0934a

Fix typo

cce5971

PawelPeczek-Roboflow reviewed Mar 24, 2025

View reviewed changes

bigbitbus and others added 3 commits March 25, 2025 10:39

Merge remote-tracking branch 'origin/main' into feat/evict-models-whe…

bf7952f

…n-cuda-memory-free-below-threshold

Changes to make pressure detection optional and pytorch load optional

b1e92c2

Linting

c0fa5ad

bigbitbus marked this pull request as ready for review March 25, 2025 15:28

bigbitbus requested a review from PawelPeczek-Roboflow March 25, 2025 15:35

bigbitbus added 2 commits March 26, 2025 09:48

Merge branch 'main' into feat/evict-models-when-cuda-memory-free-belo…

8565cfd

…w-threshold

Merge branch 'main' into feat/evict-models-when-cuda-memory-free-belo…

138bdae

…w-threshold

PawelPeczek-Roboflow approved these changes Mar 26, 2025

View reviewed changes

Merge branch 'main' into feat/evict-models-when-cuda-memory-free-belo…

d0131bd

…w-threshold

grzegorz-roboflow merged commit 1a14322 into main Mar 27, 2025
30 checks passed

grzegorz-roboflow deleted the feat/evict-models-when-cuda-memory-free-below-threshold branch March 27, 2025 13:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory pressure safety valve #1103

Memory pressure safety valve #1103

Uh oh!

bigbitbus commented Mar 23, 2025 •

edited

Loading

Uh oh!

PawelPeczek-Roboflow Mar 24, 2025

Uh oh!

bigbitbus Mar 25, 2025

Uh oh!

PawelPeczek-Roboflow Mar 24, 2025

Uh oh!

PawelPeczek-Roboflow Mar 24, 2025

Uh oh!

bigbitbus Mar 25, 2025

Uh oh!

bigbitbus Mar 25, 2025

Uh oh!

PawelPeczek-Roboflow Mar 26, 2025

Uh oh!

bigbitbus commented Mar 25, 2025

Uh oh!

Uh oh!

Uh oh!

Memory pressure safety valve #1103

Memory pressure safety valve #1103

Uh oh!

Conversation

bigbitbus commented Mar 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Any specific deployment considerations

Docs

Uh oh!

PawelPeczek-Roboflow Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

bigbitbus Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

PawelPeczek-Roboflow Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

PawelPeczek-Roboflow Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

bigbitbus Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

bigbitbus Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

PawelPeczek-Roboflow Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

bigbitbus commented Mar 25, 2025

Uh oh!

Uh oh!

Uh oh!

bigbitbus commented Mar 23, 2025 •

edited

Loading