Skip to content

[Bug] Qwen2-7B-Instruct Lora 微调-运行报错 #495

@MMMMegumi

Description

@MMMMegumi

出bug的具体模型

Qwen2-7B-Instruct Lora

出bug的具体模型教程

05-Qwen2-7B-Instruct Lora 微调.md

教程负责人

散步

Bug描述


NotImplementedError Traceback (most recent call last)
Cell In[30], line 1
----> 1 trainer = Trainer(
2 model=model,
3 args=args,
4 train_dataset=tokenized_id,
5 data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
6 )
7 trainer.train()

File ~/miniconda3/lib/python3.10/site-packages/transformers/trainer.py:528, in Trainer.init(self, model, args, data_collator, train_dataset, eval_dataset, tokenizer, model_init, compute_metrics, callbacks, optimizers, preprocess_logits_for_metrics)
523 # Bnb Quantized models doesn't support .to operation.
524 if (
525 self.place_model_on_device
526 and not getattr(model, "quantization_method", None) == QuantizationMethod.BITS_AND_BYTES
527 ):
--> 528 self._move_model_to_device(model, args.device)
530 # Force n_gpu to 1 to avoid DataParallel as MP will manage the GPUs
531 if self.is_model_parallel:

File ~/miniconda3/lib/python3.10/site-packages/transformers/trainer.py:775, in Trainer._move_model_to_device(self, model, device)
774 def _move_model_to_device(self, model, device):
--> 775 model = model.to(device)
776 # Moving a model to an XLA device disconnects the tied weights, so we have to retie them.
777 if self.args.parallel_mode == ParallelMode.TPU and hasattr(model, "tie_weights"):

File ~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:1160, in Module.to(self, *args, **kwargs)
1156 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
1157 non_blocking, memory_format=convert_to_format)
1158 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
-> 1160 return self._apply(convert)

File ~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:810, in Module._apply(self, fn, recurse)
808 if recurse:
809 for module in self.children():
--> 810 module._apply(fn)
812 def compute_should_use_set_data(tensor, tensor_applied):
813 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
814 # If the new tensor has compatible tensor type as the existing tensor,
815 # the current behavior is to change the tensor in-place using .data =,
(...)
820 # global flag to let the user control whether they want the future
821 # behavior of overwriting the existing tensor or not.

File ~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:810, in Module._apply(self, fn, recurse)
808 if recurse:
809 for module in self.children():
--> 810 module._apply(fn)
812 def compute_should_use_set_data(tensor, tensor_applied):
813 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
814 # If the new tensor has compatible tensor type as the existing tensor,
815 # the current behavior is to change the tensor in-place using .data =,
(...)
820 # global flag to let the user control whether they want the future
821 # behavior of overwriting the existing tensor or not.

[... skipping similar frames: Module._apply at line 810 (5 times)]

File ~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:810, in Module._apply(self, fn, recurse)
808 if recurse:
809 for module in self.children():
--> 810 module._apply(fn)
812 def compute_should_use_set_data(tensor, tensor_applied):
813 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
814 # If the new tensor has compatible tensor type as the existing tensor,
815 # the current behavior is to change the tensor in-place using .data =,
(...)
820 # global flag to let the user control whether they want the future
821 # behavior of overwriting the existing tensor or not.

File ~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:833, in Module._apply(self, fn, recurse)
829 # Tensors stored in modules are graph leaves, and we don't want to
830 # track autograd history of param_applied, so we have to use
831 # with torch.no_grad():
832 with torch.no_grad():
--> 833 param_applied = fn(param)
834 should_use_set_data = compute_should_use_set_data(param, param_applied)
835 if should_use_set_data:

File ~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:1158, in Module.to..convert(t)
1155 if convert_to_format is not None and t.dim() in (4, 5):
1156 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
1157 non_blocking, memory_format=convert_to_format)
-> 1158 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

NotImplementedError: Cannot copy out of meta tensor; no data!

复现步骤

前面都没问题,直到

使用 Trainer 训练

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=tokenized_id,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
)
trainer.train()
Image Image Image

期望行为

没有报错,能够像05-Qwen2-7B-Instruct Lora.ipynb一样继续执行合并加载模型

环境信息

Image

其他信息

确认事项 / Verification

  • 此问题未在过往Issue中被报告过 / This issue hasn't been reported before

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions