The basic function of llama-swap is to swap models, but it's more than that now. It provides a really nice web UI to load and unload without copy-pasting commands. Also a very under-appreciated pre-made docker, better than Ollama. I have not found anything with comparable functionality.
The automatic unloading every single time I accidentally make a call to the wrong model is major pain point. Loading a model on Strix Halo can take 1-2 minutes.
Getting a helper model to simply load at the same time as another model is always a fight. The groups are borderline impossible to understand. (I took one look at the Matrix letter jumbles and my brain rejected the entire scheme. I know from experience this is never going to happen.)
The two configs in one file paradigm (a second list of models in a group config) is very difficult to manage. I am changing the models that I use every day. It's hard enough getting each individual entry right without managing a second list, scrolling up and down, manually syncing model names.
Maybe there should just be a simple way to stop the constant unloading of models and allow simple manual operation?
The basic function of llama-swap is to swap models, but it's more than that now. It provides a really nice web UI to load and unload without copy-pasting commands. Also a very under-appreciated pre-made docker, better than Ollama. I have not found anything with comparable functionality.
The automatic unloading every single time I accidentally make a call to the wrong model is major pain point. Loading a model on Strix Halo can take 1-2 minutes.
Getting a helper model to simply load at the same time as another model is always a fight. The groups are borderline impossible to understand. (I took one look at the Matrix letter jumbles and my brain rejected the entire scheme. I know from experience this is never going to happen.)
The two configs in one file paradigm (a second list of models in a group config) is very difficult to manage. I am changing the models that I use every day. It's hard enough getting each individual entry right without managing a second list, scrolling up and down, manually syncing model names.
Maybe there should just be a simple way to stop the constant unloading of models and allow simple manual operation?