Skip to content

[Feature]: Inference and embedding supports CPU/GPU off load #41

Open
@Aisuko

Description

Contact Details(optional)

No response

What feature are you requesting?

We already support GPU inference and embedding at Kirin project. So, we should also support GPU in this project. Furthermore, please keep in mind what I mentioned in last meeting. We want CPU/GPU offload not the CPU or GPU separately mode.

https://medium.com/@aisuko/quantization-tech-of-llms-gguf-0342a08f082c

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions