v0.2.4 为什么样例展示里 多并发比单并发的总速度要快 #1145
Unanswered
JennieGao-njust
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Request 3: Decode Speed = 2.80 tokens/s,First packet time = 44.05s
Request 1: Decode Speed = 3.00 tokens/s,First packet time = 40.20s
Request 0: Decode Speed = 4.07 tokens/s,First packet time = 79.58s
Request 2: Decode Speed = 4.02 tokens/s,First packet time = 79.30s 单并发测试是11tokens 几乎是平分的,为什么样例里原始但并发是17token/s,并发后是4*10tokens/s,
500GB内存 500GB硬盘 模型是UD-Q2_K_XL
cpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 52 bits physical, 57 bits virtual
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 32
GPU 是L20
Beta Was this translation helpful? Give feedback.
All reactions