Load the model on demand #17
Unanswered
rockstar2020
asked this question in
Q&A
Replies: 1 comment
-
https://github.com/mostlygeek/llama-swap should do that for you. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I have a very limited VRAM which I'd like to run few AI apps on.
Is there a way to load the model only when a transcription task is required, and then offload it after it's done?
I'm using uv-gpu docker.
Thanks
Beta Was this translation helpful? Give feedback.
All reactions