[Feature]: Inference and embedding supports CPU/GPU off load

### Contact Details(optional)

_No response_

### What feature are you requesting?

We already support GPU inference and embedding at Kirin project. So, we should also support GPU in this project. Furthermore, please keep in mind what I mentioned in last meeting. We want CPU/GPU offload not the CPU or GPU separately mode.

https://medium.com/@aisuko/quantization-tech-of-llms-gguf-0342a08f082c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Inference and embedding supports CPU/GPU off load #41

Contact Details(optional)

What feature are you requesting?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Inference and embedding supports CPU/GPU off load #41

Description

Contact Details(optional)

What feature are you requesting?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions