Skip to content

[性能需求]: tpot不如mindIE,期待优化 #4395

@liuzhenjluccst

Description

@liuzhenjluccst

Anything you want to discuss about vllm on ascend.

环境:vllm0110基于cann8.3rc1,910b4,qwen3模型的dense和moe
问题:相比mindIE,vllm的ttft快,tpot慢,期待优化
压测:

框架 环境 ttft 1000/tpot
vllm qwen3-235b-int8,单机8卡,2k输入,16并发 1.3s 14tok/s
mindIE qwen3-235b-int8,单机8卡,2k输入,16并发 2.9s 20tok/s
vllm qwen3-235b-int8,单机8卡,8k输入,16并发 3.5s 12tok/s
mindIE qwen3-235b-int8,单机8卡,8k输入,16并发 12s 19tok/s
vllm qwen3-235b-int8,单机8卡,64k输入,4并发 22s 8tok/s
mindIE qwen3-235b-int8,单机8卡,64k输入,4并发 39s 9tok/s
vllm qwen3-32b,2卡,2k输入,16并发 1.8s 15tok/s
mindIE qwen3-32b,2卡,2k输入,16并发 4.7s 21tok/s
vllm qwen3-32b,2卡,8k输入,16并发 9s 9tok/s
mindIE qwen3-32b,2卡,8k输入,16并发 34s 15tok/s
vllm qwen3-32b,2卡,64k输入,2并发 29s 7tok/s
mindIE qwen3-32b,2卡,64k输入,2并发 97s 25tok/s

结论:
首先yes,vllm的ttft确实优秀不少,但tpot影响了总生成时间以及用户对流式生成的体感,期待继续优化tpot,谢谢!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions