新人请教，微调deepseek-r1 32B模型1w多条训练集数据，大家微调训练需要多长时间呢？有什么优化的 #7591

wngbob · 2025-04-03T07:31:01Z

wngbob
Apr 3, 2025

Reminder

I have read the above rules and searched the existing issues.

System Info

llamafactory version: 0.9.2.dev0
Platform: Linux-6.8.0-52-generic-x86_64-with-glibc2.35
Python version: 3.10.12
PyTorch version: 2.5.1+cu124 (GPU)
Transformers version: 4.49.0
Datasets version: 3.2.0
Accelerate version: 1.2.1
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: NVIDIA L40S
GPU number: 2
GPU memory: 44.52GB
Bitsandbytes version: 0.45.2

Reproduction

``
[INFO|2025-04-03 15:09:41] llamafactory.model.model_utils.checkpointing:157 >> Gradient checkpointing enabled.
[INFO|2025-04-03 15:09:41] llamafactory.model.model_utils.attention:157 >> Using torch SDPA for faster training and inference.
[INFO|2025-04-03 15:09:41] llamafactory.model.adapter:157 >> Upcasting trainable params to float32.
[INFO|2025-04-03 15:09:41] llamafactory.model.adapter:157 >> Fine-tuning method: LoRA
[INFO|2025-04-03 15:09:41] llamafactory.model.model_utils.misc:157 >> Found linear modules: v_proj,o_proj,q_proj,k_proj,down_proj,gate_proj,up_proj
No label_names provided for model class PeftModelForCausalLM. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
[INFO|2025-04-03 15:09:42] llamafactory.model.loader:157 >> trainable params: 67,108,864 || all params: 32,830,985,216 || trainable%: 0.2044
[INFO|trainer.py:746] 2025-04-03 15:09:42,189 >> Using auto half precision backend
[WARNING|trainer.py:781] 2025-04-03 15:09:42,190 >> No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
[INFO|2025-04-03 15:09:42] llamafactory.train.trainer_utils:157 >> Using LoRA+ optimizer with loraplus lr ratio 16.00.
[INFO|trainer.py:2405] 2025-04-03 15:09:42,679 >> ***** Running training *****
[INFO|trainer.py:2406] 2025-04-03 15:09:42,679 >> Num examples = 12,619
[INFO|trainer.py:2407] 2025-04-03 15:09:42,679 >> Num Epochs = 30
[INFO|trainer.py:2408] 2025-04-03 15:09:42,679 >> Instantaneous batch size per device = 2
[INFO|trainer.py:2411] 2025-04-03 15:09:42,679 >> Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:2412] 2025-04-03 15:09:42,679 >> Gradient Accumulation steps = 2
[INFO|trainer.py:2413] 2025-04-03 15:09:42,679 >> Total optimization steps = 47,310
[INFO|trainer.py:2414] 2025-04-03 15:09:42,686 >> Number of trainable parameters = 67,108,864
0%| | 5/47310 [01:13<191:09:20, 14.55s/it][INFO|2025-04-03 15:10:57] llamafactory.train.callbacks:157 >> {'loss': 1.1686, 'learning_rate': 5.0000e-05, 'epoch': 0.00, 'throughput': 1020.13}
{'loss': 1.1686, 'grad_norm': 0.14585863053798676, 'learning_rate': 4.999999862201698e-05, 'epoch': 0.0, 'num_input_tokens_seen': 75232}
0%| | 10/47310 [02:34<206:30:10, 15.72s/it][INFO|2025-04-03 15:12:18] llamafactory.train.callbacks:157 >> {'loss': 0.9774, 'learning_rate': 5.0000e-05, 'epoch': 0.01, 'throughput': 1011.31}
{'loss': 0.9774, 'grad_norm': 0.11302416771650314, 'learning_rate': 4.999999448806806e-05, 'epoch': 0.01, 'num_input_tokens_seen': 156368}
0%| | 15/47310 [03:50<204:48:43, 15.59s/it][INFO|2025-04-03 15:13:34] llamafactory.train.callbacks:157 >> {'loss': 0.9439, 'learning_rate': 5.0000e-05, 'epoch': 0.01, 'throughput': 1001.57}
{'loss': 0.9439, 'grad_norm': 0.1326381415128708, 'learning_rate': 4.999998759815371e-05, 'epoch': 0.01, 'num_input_tokens_seen': 230736}
0%| | 20/47310 [05:05<196:26:14, 14.95s/it][INFO|2025-04-03 15:14:49] llamafactory.train.callbacks:157 >> {'loss': 0.9269, 'learning_rate': 5.0000e-05, 'epoch': 0.01, 'throughput': 997.06}
{'loss': 0.9269, 'grad_norm': 0.09542257338762283, 'learning_rate': 4.9999977952274676e-05, 'epoch': 0.01, 'num_input_tokens_seen': 304624}
0%| | 25/47310 [06:19<197:02:47, 15.00s/it][INFO|2025-04-03 15:16:03] llamafactory.train.callbacks:157 >> {'loss': 0.8354, 'learning_rate': 5.0000e-05, 'epoch': 0.02, 'throughput': 1000.50}
{'loss': 0.8354, 'grad_norm': 0.11180797964334488, 'learning_rate': 4.999996555043203e-05, 'epoch': 0.02, 'num_input_tokens_seen': 380112}
0%| | 30/47310 [07:33<192:09:52, 14.63s/it][INFO|2025-04-03 15:17:17] llamafactory.train.callbacks:157 >> {'loss': 0.7925, 'learning_rate': 5.0000e-05, 'epoch': 0.02, 'throughput': 1007.71}
{'loss': 0.7925, 'grad_norm': 0.10915672779083252, 'learning_rate': 4.9999950392627126e-05, 'epoch': 0.02, 'num_input_tokens_seen': 457424}
0%| | 35/47310 [08:51<205:51:53, 15.68s/it][INFO|2025-04-03 15:18:35] llamafactory.train.callbacks:157 >> {'loss': 0.7684, 'learning_rate': 5.0000e-05, 'epoch': 0.02, 'throughput': 1009.29}
{'loss': 0.7684, 'grad_norm': 0.10244904458522797, 'learning_rate': 4.999993247886166e-05, 'epoch': 0.02, 'num_input_tokens_seen': 536736}
0%| | 40/47310 [10:08<207:55:55, 15.84s/it][INFO|2025-04-03 15:19:51] llamafactory.train.callbacks:157 >> {'loss': 0.8594, 'learning_rate': 5.0000e-05, 'epoch': 0.03, 'throughput': 1008.51}
{'loss': 0.8594, 'grad_norm': 0.10987041145563126, 'learning_rate': 4.999991180913758e-05, 'epoch': 0.03, 'num_input_tokens_seen': 613408}
0%| | 45/47310 [11:27<211:56:37, 16.14s/it][INFO|2025-04-03 15:21:11] llamafactory.train.callbacks:157 >> {'loss': 0.7329, 'learning_rate': 5.0000e-05, 'epoch': 0.03, 'throughput': 1008.89}
{'loss': 0.7329, 'grad_norm': 0.11014138162136078, 'learning_rate': 4.999988838345718e-05, 'epoch': 0.03, 'num_input_tokens_seen': 693888}
0%| | 50/47310 [12:39<188:59:24, 14.40s/it][INFO|2025-04-03 15:22:23] llamafactory.train.callbacks:157 >> {'loss': 0.8117, 'learning_rate': 5.0000e-05, 'epoch': 0.03, 'throughput': 1006.75}
{'loss': 0.8117, 'grad_norm': 0.145054891705513, 'learning_rate': 4.999986220182303e-05, 'epoch': 0.03, 'num_input_tokens_seen': 764720}
0%| | 55/47310 [13:53<190:34:08, 14.52s/it][INFO|2025-04-03 15:23:37] llamafactory.train.callbacks:157 >> {'loss': 0.7202, 'learning_rate': 5.0000e-05, 'epoch': 0.03, 'throughput': 1008.15}
{'loss': 0.7202, 'grad_norm': 0.128993958234787, 'learning_rate': 4.999983326423804e-05, 'epoch': 0.03, 'num_input_tokens_seen': 840288}
0%| | 60/47310 [15:16<210:34:10, 16.04s/it][INFO|2025-04-03 15:25:00] llamafactory.train.callbacks:157 >> {'loss': 0.7561, 'learning_rate': 5.0000e-05, 'epoch': 0.04, 'throughput': 1006.23}
{'loss': 0.7561, 'grad_norm': 0.1089465394616127, 'learning_rate': 4.999980157070538e-05, 'epoch': 0.04, 'num_input_tokens_seen': 922000}
0%| | 65/47310 [16:36<209:34:52, 15.97s/it][INFO|2025-04-03 15:26:19] llamafactory.train.callbacks:157 >> {'loss': 0.7396, 'learning_rate': 5.0000e-05, 'epoch': 0.04, 'throughput': 1006.27}
{'loss': 0.7396, 'grad_norm': 0.12486272305250168, 'learning_rate': 4.9999767121228546e-05, 'epoch': 0.04, 'num_input_tokens_seen': 1002352}
0%| | 69/47310 [17:35<200:29:28, 15.28s/it]
我这边训练12,619条训练集，需要200个小时？需要这么久吗？是不是我哪里配置没有配置对呢，有什么优化的办法吗？



### Others

我这边训练12,619条训练集，需要200个小时？需要这么久吗？是不是我哪里配置没有配置对呢，有什么优化的办法吗？

wngbob · 2025-04-03T07:33:41Z

wngbob
Apr 3, 2025
Author

这是我的显卡使用情况

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

新人请教，微调deepseek-r1 32B模型1w多条训练集数据，大家微调训练需要多长时间呢？有什么优化的 #7591

{{title}}

Replies: 1 comment

{{title}}

Select a reply

新人请教，微调deepseek-r1 32B模型1w多条训练集数据，大家微调训练需要多长时间呢？有什么优化的 #7591

wngbob Apr 3, 2025

Reminder

System Info

Reproduction

Replies: 1 comment

wngbob Apr 3, 2025 Author

wngbob
Apr 3, 2025

wngbob
Apr 3, 2025
Author