Skip to content
This repository was archived by the owner on Oct 25, 2024. It is now read-only.
This repository was archived by the owner on Oct 25, 2024. It is now read-only.

question about configuration #1643

Open
@menglin0320

Description

@menglin0320

In the examples you guys didn't mention how to specify parameters like batch size, max input length etc.
My first question is how to change the max input length, I tried the llama2 example for a RAG usage case. llama2 should be able to handle 4096 input tokens but it's limited to 1024 for some reason.
Similarly though I don't feel batching is a good idea on cpu, I still want to try batched inference with this package. is there a document for how to configure those things?

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentation

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions