PaddleMIX/docs/benchmark/train_benchmark.md at develop · Le-soleile/PaddleMIX · GitHub

156 lines (120 loc) · 4.35 KB

Train Benchmark

Fig

Model	Stage	Paddle training speed（ips）	Contrast	Pytorch training speed（ips）	Paddle GPU memory uage（G）
LLaVA1.6 7B	Pretrain	82	+26%	65	19/22
	SFT	52	+6%	49	33/49
	LoRA	56	+14%	49	16/17
LLaVA1.6 13B	Pretrain	52	+18%	44	33/36
	SFT	24	+4%	23	50/68
	LoRA	36	+5%	34	29/30
Qwen2VL 2B	SFT	33	+43%	23	-
Qwen2VL 7B	SFT	13	+18%	11	-
Stable Diffusion 1.5	Pretrain	560	-12%	638	28/34
	LoRA	200	+6%	187	30/34
Stable Diffusion 3	SFT (Dreambooth)	34	0	34	-
	LoRA	66	-0.01%	67	-

Notes:

All models were tested on the H800 (8 * 80G) platform
For GPU menory usage, the table shows max_memory_allocated/max_memory_reserved
Please see below for the testing configuration details.

See

Software	Version
CUDA	12.3
CUDNN	9.0
PaddlePaddle	3.0beta2
PaddleNLP	3.0beta3
Pytorch	2.5