Skip to content

Can lserver be used for non quantized versions of initial LLMs? #65

@zZzZ9zZ9

Description

@zZzZ9zZ9
Image

Can lserver be used for non quantized versions of initial LLMs? For example, the original version of Llama-3-8B-Instruct-Gradient-1048k.
I replaced the model with the original version of Llama-3-8B in the test code of lserver and ran lserver, but the accuracy was very low. I don't know what the problem is.
I only made the following two changes in 'submit_longbench. sh':
model_path= path to Llama-3-8B-Instruct-Gradient-1048k
precision="w16a16kv8"
I don't know if there are any areas in the code that need to be changed, thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions