Auto-unload models after a set time of inactivity

It would be very useful if lemonade-server could auto-unload models that haven't been used for a while, similar to what Ollama does. In fact, this feature is the only reason I even still use Ollama at this point. It's just super convenient to run the inference engine as a systemd service on boot and have it available whenever I need it, but having to manually unload the models or stop the service if I want to use other VRAM-heavy applications like Blender or ComfyUI, or run a game gets annoying quickly. Lemonade already keeps track of model inactivity if I understand correctly, but only uses this functionality to unload inactive models if you need VRAM to load another model in Lemonade right now.

It would be even nicer if the server would only unload inactive models if some other application starts filling up VRAM. Best of both worlds - unused RAM is wasted RAM after all. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-unload models after a set time of inactivity #1365

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Auto-unload models after a set time of inactivity #1365

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions