[Feature]: Configuration of CPU+GPU offload 

### Contact Details(optional)

_No response_

### What feature are you requesting?

Currently, we didn't compile llamacpp with `cuda` accelerate. If we want to support use offload feature, we need to compile llamacpp with `gpu` label. 

https://github.com/SkywardAI/llama.cpp/blob/a59f8fdc85e1119d470d8766e29617962549d993/examples/main/README.md?plain=1#L72

how many layer you want to run your model on GPU?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Configuration of CPU+GPU offload #355

Contact Details(optional)

What feature are you requesting?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: Configuration of CPU+GPU offload #355

Description

Contact Details(optional)

What feature are you requesting?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions