Skip to content

Latest commit

 

History

History
113 lines (91 loc) · 3.6 KB

File metadata and controls

113 lines (91 loc) · 3.6 KB

Inference Benchmark

Fig

Figure_2

Model Paddle Inference (s/it) Pytorch (s/it) vLLM (s/it) TensorRT (s/it) Note
LLaVA 1.6 7B 1.31 2.17 1.74 - bf16, max_token=128
LLaVA 1.6 13B 1.65 2.62 - - bf16, max_token=128
Qwen2-VL 2B 1.44 2.35 0.97 - bf16, max_token=128
Qwen2-VL 7B 1.73 4.50 1.82 - bf16, max_token=128
Qwen2.5-VL 3B 1.24 4.92 1.39 - bf16, max_token=128
Qwen2.5-VL 7B 1.76 3.89 1.92 - bf16, max_token=128
Stable Diffusion 1.5 0.79 - - 0.84 512 * 512, 50 steps
Stable Diffusion 3 1.20 - - 1.16 512 * 512, 50 steps

Notes:

  • All models were tested on the A800 (80G) platform
  • Please see below for the testing configuration details.
See
Software Version
CUDA 12.3
PaddlePaddle Nightly
PaddleNLP Nightly
Python 3.10