| Model | Paddle Inference (s/it) | Pytorch (s/it) | vLLM (s/it) | TensorRT (s/it) | Note |
|---|---|---|---|---|---|
| LLaVA 1.6 7B | 1.31 | 2.17 | 1.74 | - | bf16, max_token=128 |
| LLaVA 1.6 13B | 1.65 | 2.62 | - | - | bf16, max_token=128 |
| Qwen2-VL 2B | 1.44 | 2.35 | 0.97 | - | bf16, max_token=128 |
| Qwen2-VL 7B | 1.73 | 4.50 | 1.82 | - | bf16, max_token=128 |
| Qwen2.5-VL 3B | 1.24 | 4.92 | 1.39 | - | bf16, max_token=128 |
| Qwen2.5-VL 7B | 1.76 | 3.89 | 1.92 | - | bf16, max_token=128 |
| Stable Diffusion 1.5 | 0.79 | - | - | 0.84 | 512 * 512, 50 steps |
| Stable Diffusion 3 | 1.20 | - | - | 1.16 | 512 * 512, 50 steps |
Notes:
- All models were tested on the A800 (80G) platform
- Please see below for the testing configuration details.
See
| Software | Version |
|---|---|
| CUDA | 12.3 |
| PaddlePaddle | Nightly |
| PaddleNLP | Nightly |
| Python | 3.10 |
