| Model | Stage | Paddle training speed(ips) | Contrast | Pytorch training speed(ips) | Paddle GPU memory uage(G) |
|---|---|---|---|---|---|
| LLaVA1.6 7B | Pretrain | 82 | +26% | 65 | 19/22 |
| SFT | 52 | +6% | 49 | 33/49 | |
| LoRA | 56 | +14% | 49 | 16/17 | |
| LLaVA1.6 13B | Pretrain | 52 | +18% | 44 | 33/36 |
| SFT | 24 | +4% | 23 | 50/68 | |
| LoRA | 36 | +5% | 34 | 29/30 | |
| Qwen2VL 2B | SFT | 33 | +43% | 23 | - |
| Qwen2VL 7B | SFT | 13 | +18% | 11 | - |
| Stable Diffusion 1.5 | Pretrain | 560 | -12% | 638 | 28/34 |
| LoRA | 200 | +6% | 187 | 30/34 | |
| Stable Diffusion 3 | SFT (Dreambooth) | 34 | 0 | 34 | - |
| LoRA | 66 | -0.01% | 67 | - |
Notes:
- All models were tested on the H800 (8 * 80G) platform
- For
GPU menory usage, the table showsmax_memory_allocated/max_memory_reserved - Please see below for the testing configuration details.
See
| Software | Version |
|---|---|
| CUDA | 12.3 |
| CUDNN | 9.0 |
| PaddlePaddle | 3.0beta2 |
| PaddleNLP | 3.0beta3 |
| Pytorch | 2.5 |
