Can lserver be used for non quantized versions of initial LLMs?

<img width="797" alt="Image" src="https://github.com/user-attachments/assets/2a29156a-f0cb-43cb-81e4-f3f691f69252" />

Can lserver be used for non quantized versions of initial LLMs? For example, the original version of Llama-3-8B-Instruct-Gradient-1048k.
I replaced the model with the original version of Llama-3-8B in the test code of lserver and ran lserver, but the accuracy was very low. I don't know what the problem is.
I only made the following two changes in 'submit_longbench. sh':
model_path= path to Llama-3-8B-Instruct-Gradient-1048k
precision="w16a16kv8"
I don't know if there are any areas in the code that need to be changed, thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can lserver be used for non quantized versions of initial LLMs? #65

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Can lserver be used for non quantized versions of initial LLMs? #65

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions