Can lserver be used for non quantized versions of initial LLMs? For example, the original version of Llama-3-8B-Instruct-Gradient-1048k.
I replaced the model with the original version of Llama-3-8B in the test code of lserver and ran lserver, but the accuracy was very low. I don't know what the problem is.
I only made the following two changes in 'submit_longbench. sh':
model_path= path to Llama-3-8B-Instruct-Gradient-1048k
precision="w16a16kv8"
I don't know if there are any areas in the code that need to be changed, thank you!