Replies: 1 comment
-
Qwen2-1.5B-Instruct和Qwen2.5-1.5B-Instruct的模型结构完全是一样的,速度、吞吐应当是完全一样的(可比条件下)。 你说的时延可以再明确下,如果是整体返回的时长,这个可能跟模型行为有关,Qwen2.5-1.5B-Instruct生成的可能会更长。 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
有人测过Qwen2和Qwen2.5 1.5B下的推理速度差异吗?之前有套VLLM的架构发现从2变到2.5时延提高了,不知道是不是需要改一些东西
Beta Was this translation helpful? Give feedback.
All reactions