Open
Description
Feature Request
Describe the problem the feature is intended to solve
I have thousands of models to be served, but quite a big part of these models are not frequently requested, actually only a few of them are. Loading all of these models into memory at the same time is quite resource-consuming & time-consuming.
Describe the solution
I wonder if there's an option to lazy load models and only caching those most frequently requested models in memory?
Describe alternatives you've considered
None yet.
Additional context
Actually I'm not sure if this is a feature request or we are already having this feature.