Replies: 1 comment 1 reply
-
@myselfffo try to reduce even more the number of layers loaded into your VRAM ( Additionally, you can change the model you are using - c.f. https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF#provided-files (if you are using Mistral) - you should take a model that has a smaller memory footprint. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi
I have my installation running on Debian 12 PC with GPU. Its very slow and i constantly run out of memory even though I have lowered the settings:
@Inject
def init(self) -> None:
match settings.llm.mode:
case "local":
from llama_index.llms import LlamaCPP
My GPU is:
NVIDIA GeForce GTX 1050 Ti
CUDA Cores: 768
Total Memory: 4096 MB
I have never tried running it on CPU. I have:
Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz 0 0:0 4400.00 MHz
MemTotal 32805176 KiB
Would that be faster to run on CPU? if so how can i install again with CPU only without ruining the GPU option?
Beta Was this translation helpful? Give feedback.
All reactions