[Feature request] Never automatically unload models

The basic function of llama-swap is to swap models, but it's more than that now. It provides a really nice web UI to load and unload without copy-pasting commands. Also a very under-appreciated pre-made docker, better than Ollama. I have not found anything with comparable functionality.

The automatic unloading every single time I accidentally make a call to the wrong model is major pain point. Loading a model on Strix Halo can take 1-2 minutes.

Getting a helper model to simply load at the same time as another model is always a fight. The groups are borderline impossible to understand. (I took one look at the Matrix letter jumbles and my brain rejected the entire scheme. I know from experience this is never going to happen.)

The two configs in one file paradigm (a second list of models in a group config) is very difficult to manage. I am changing the models that I use every day. It's hard enough getting each individual entry right without managing a second list, scrolling up and down, manually syncing model names.

Maybe there should just be a simple way to stop the constant unloading of models and allow simple manual operation?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Never automatically unload models #749

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature request] Never automatically unload models #749

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions