Hardware performance #1357

myselfffo · 2023-12-03T11:50:31Z

myselfffo
Dec 3, 2023

Hi
I have my installation running on Debian 12 PC with GPU. Its very slow and i constantly run out of memory even though I have lowered the settings:

@Inject
def init(self) -> None:
match settings.llm.mode:
case "local":
from llama_index.llms import LlamaCPP

            self.llm = LlamaCPP(
                model_path=str(models_path / settings.local.llm_hf_model_file),
                temperature=0.1,
                # llama2 has a context window of 4096 tokens,
                # but we set it lower to allow for some wiggle room
                context_window=3500,
                generate_kwargs={},
                # All to GPU
                model_kwargs={"n_gpu_layers": 18},
                # transform inputs into Llama2 format
                messages_to_prompt=messages_to_prompt,
                completion_to_prompt=completion_to_prompt,
                verbose=True,
               
            )

        case "sagemaker":
            from private_gpt.components.llm.custom.sagemaker import SagemakerLLM

            self.llm = SagemakerLLM(
                endpoint_name=settings.sagemaker.llm_endpoint_name,
            )
        case "openai":
            from llama_index.llms import OpenAI

            openai_settings = settings.openai.api_key
            self.llm = OpenAI(api_key=openai_settings)
        case "mock":
            self.llm = MockLLM()

My GPU is:
NVIDIA GeForce GTX 1050 Ti
CUDA Cores: 768
Total Memory: 4096 MB

I have never tried running it on CPU. I have:
Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz 0 0:0 4400.00 MHz
MemTotal 32805176 KiB

Would that be faster to run on CPU? if so how can i install again with CPU only without ruining the GPU option?

lopagela · 2023-12-03T13:25:33Z

lopagela
Dec 3, 2023

@myselfffo try to reduce even more the number of layers loaded into your VRAM (model_kwargs={"n_gpu_layers": 12},, or even 8)

Additionally, you can change the model you are using - c.f. https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF#provided-files (if you are using Mistral) - you should take a model that has a smaller memory footprint.

1 reply

myselfffo Dec 3, 2023
Author

Thanks. But its not exactly an answer to my question.... Anyway. is it possible to run LlamaCPP CPU only beside the GPU option I have running already?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hardware performance #1357

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Hardware performance #1357

myselfffo Dec 3, 2023

Replies: 1 comment · 1 reply

lopagela Dec 3, 2023

myselfffo Dec 3, 2023 Author

myselfffo
Dec 3, 2023

Replies: 1 comment 1 reply

lopagela
Dec 3, 2023

myselfffo Dec 3, 2023
Author