PaddleMIX/docs/benchmark/inference_benchmark.md at develop · zhaop-l/PaddleMIX · GitHub

113 lines (91 loc) · 3.6 KB

Inference Benchmark

Fig

Model	Paddle Inference (s/it)	Pytorch (s/it)	vLLM (s/it)	TensorRT (s/it)	Note
LLaVA 1.6 7B	1.31	2.17	1.74	-	bf16, max_token=128
LLaVA 1.6 13B	1.65	2.62	-	-	bf16, max_token=128
Qwen2-VL 2B	1.44	2.35	0.97	-	bf16, max_token=128
Qwen2-VL 7B	1.73	4.50	1.82	-	bf16, max_token=128
Qwen2.5-VL 3B	1.24	4.92	1.39	-	bf16, max_token=128
Qwen2.5-VL 7B	1.76	3.89	1.92	-	bf16, max_token=128
Stable Diffusion 1.5	0.79	-	-	0.84	512 * 512, 50 steps
Stable Diffusion 3	1.20	-	-	1.16	512 * 512, 50 steps

Notes:

All models were tested on the A800 (80G) platform
Please see below for the testing configuration details.

See

Software	Version
CUDA	12.3
PaddlePaddle	Nightly
PaddleNLP	Nightly
Python	3.10