-
Notifications
You must be signed in to change notification settings - Fork 16
Description
您好,我在执行qlora微调复现时遇到这个问题,报错信息是:
Traceback (most recent call last):
File "../finetune_llama3.py", line 452, in
train()
File "../finetune_llama3.py", line 445, in train
trainer.train()
File "/home/nlpir/miniconda3/envs/cjy_llama/lib/python3.8/site-packages/transformers/trainer.py", line 1624, in train
return inner_training_loop(
File "/home/nlpir/miniconda3/envs/cjy_llama/lib/python3.8/site-packages/transformers/trainer.py", line 1928, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "/home/nlpir/miniconda3/envs/cjy_llama/lib/python3.8/site-packages/accelerate/data_loader.py", line 452, in iter
current_batch = next(dataloader_iter)
File "/home/nlpir/miniconda3/envs/cjy_llama/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 633, in next
data = self._next_data()
File "/home/nlpir/miniconda3/envs/cjy_llama/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 677, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/nlpir/miniconda3/envs/cjy_llama/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/nlpir/miniconda3/envs/cjy_llama/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "../finetune_llama3.py", line 255, in getitem
ret = preprocess([self.raw_data[i]["conversations"]], self.tokenizer, self.max_len)
File "../finetune_llama3.py", line 192, in preprocess
assert len(input_id) == len(target)
AssertionError
实际情况是input_id一直比target长度大1.
我在6块1080ti运行的,shell脚本内容如下:
NCCL_P2P_DISABLE=1
NCCL_IB_DISABLE=1
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5
torchrun
--nproc_per_node 6
--nnodes 1
--node_rank 0
--master_addr localhost
--master_port 6601
../finetune_llama3.py
--model_name_or_path "../model_hub/LLM-Research/Meta-Llama-3-8B-Instruct/"
--data_path "../data/Belle_sampled_qwen.json"
--fp16 True
--output_dir "../output/llama3_8B_qlora"
--num_train_epochs 100
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 16
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 5
--save_total_limit 1
--learning_rate 1e-5
--weight_decay 0.1
--adam_beta2 0.95
--warmup_ratio 0.01
--lr_scheduler_type "cosine"
--logging_steps 1
--report_to "none"
--model_max_length 4096
--gradient_checkpointing True
--lazy_preprocess True
--deepspeed "../config/ds_config_zero2.json"
--use_lora
--load_in_4bit
--q_lora
只是改了下CUDA_VISIBLE_DEVICES和nproc_per_node ,并且把bf16改为fp16.