[Feature] Inferencing using multiple backends #622

Open

Open

[Feature] Inferencing using multiple backends#622

Is there a plan to implement inferencing using multiple backends like llama.cpp? As in offloading a number of layers to GPU to control vram usage, gpu power draw, etc.

Metadata

Assignees

No one assigned

Labels

No labels

No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests