[Bug] Qwen2-7B-Instruct Lora 微调-运行报错

### 出bug的具体模型

Qwen2-7B-Instruct Lora

### 出bug的具体模型教程

05-Qwen2-7B-Instruct Lora 微调.md

### 教程负责人

散步

### Bug描述

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[30], line 1
----> 1 trainer = Trainer(
      2     model=model,
      3     args=args,
      4     train_dataset=tokenized_id,
      5     data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
      6 )
      7 trainer.train()

File ~/miniconda3/lib/python3.10/site-packages/transformers/trainer.py:528, in Trainer.__init__(self, model, args, data_collator, train_dataset, eval_dataset, tokenizer, model_init, compute_metrics, callbacks, optimizers, preprocess_logits_for_metrics)
    523 # Bnb Quantized models doesn't support `.to` operation.
    524 if (
    525     self.place_model_on_device
    526     and not getattr(model, "quantization_method", None) == QuantizationMethod.BITS_AND_BYTES
    527 ):
--> 528     self._move_model_to_device(model, args.device)
    530 # Force n_gpu to 1 to avoid DataParallel as MP will manage the GPUs
    531 if self.is_model_parallel:

File ~/miniconda3/lib/python3.10/site-packages/transformers/trainer.py:775, in Trainer._move_model_to_device(self, model, device)
    774 def _move_model_to_device(self, model, device):
--> 775     model = model.to(device)
    776     # Moving a model to an XLA device disconnects the tied weights, so we have to retie them.
    777     if self.args.parallel_mode == ParallelMode.TPU and hasattr(model, "tie_weights"):

File ~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:1160, in Module.to(self, *args, **kwargs)
   1156         return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
   1157                     non_blocking, memory_format=convert_to_format)
   1158     return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
-> 1160 return self._apply(convert)

File ~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:810, in Module._apply(self, fn, recurse)
    808 if recurse:
    809     for module in self.children():
--> 810         module._apply(fn)
    812 def compute_should_use_set_data(tensor, tensor_applied):
    813     if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
    814         # If the new tensor has compatible tensor type as the existing tensor,
    815         # the current behavior is to change the tensor in-place using `.data =`,
   (...)
    820         # global flag to let the user control whether they want the future
    821         # behavior of overwriting the existing tensor or not.

File ~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:810, in Module._apply(self, fn, recurse)
    808 if recurse:
    809     for module in self.children():
--> 810         module._apply(fn)
    812 def compute_should_use_set_data(tensor, tensor_applied):
    813     if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
    814         # If the new tensor has compatible tensor type as the existing tensor,
    815         # the current behavior is to change the tensor in-place using `.data =`,
   (...)
    820         # global flag to let the user control whether they want the future
    821         # behavior of overwriting the existing tensor or not.

    [... skipping similar frames: Module._apply at line 810 (5 times)]

File ~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:810, in Module._apply(self, fn, recurse)
    808 if recurse:
    809     for module in self.children():
--> 810         module._apply(fn)
    812 def compute_should_use_set_data(tensor, tensor_applied):
    813     if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
    814         # If the new tensor has compatible tensor type as the existing tensor,
    815         # the current behavior is to change the tensor in-place using `.data =`,
   (...)
    820         # global flag to let the user control whether they want the future
    821         # behavior of overwriting the existing tensor or not.

File ~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:833, in Module._apply(self, fn, recurse)
    829 # Tensors stored in modules are graph leaves, and we don't want to
    830 # track autograd history of `param_applied`, so we have to use
    831 # `with torch.no_grad():`
    832 with torch.no_grad():
--> 833     param_applied = fn(param)
    834 should_use_set_data = compute_should_use_set_data(param, param_applied)
    835 if should_use_set_data:

File ~/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:1158, in Module.to.<locals>.convert(t)
   1155 if convert_to_format is not None and t.dim() in (4, 5):
   1156     return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
   1157                 non_blocking, memory_format=convert_to_format)
-> 1158 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

NotImplementedError: Cannot copy out of meta tensor; no data!

### 复现步骤

前面都没问题,直到
## 使用 Trainer 训练

```python
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=tokenized_id,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
)
trainer.train()
```

<img width="1014" height="506" alt="Image" src="https://github.com/user-attachments/assets/5664622b-12b7-448c-809b-a4ed760f9748" />
<img width="1010" height="539" alt="Image" src="https://github.com/user-attachments/assets/9baf365f-59b6-4e00-99ea-7177a3e150c3" />
<img width="1027" height="546" alt="Image" src="https://github.com/user-attachments/assets/9e44572b-0a21-4c88-b22c-2371abdce955" />

### 期望行为

没有报错,能够像05-Qwen2-7B-Instruct Lora.ipynb一样继续执行**合并加载模型**

### 环境信息

<img width="1138" height="326" alt="Image" src="https://github.com/user-attachments/assets/df16c54b-53ee-47c6-ab48-b0aecc4ee67d" />

### 其他信息

无

### 确认事项 / Verification

- [x] 此问题未在过往Issue中被报告过 / This issue hasn't been reported before

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Qwen2-7B-Instruct Lora 微调-运行报错 #495

出bug的具体模型

出bug的具体模型教程

教程负责人

Bug描述

复现步骤

使用 Trainer 训练

期望行为

环境信息

其他信息

确认事项 / Verification

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Qwen2-7B-Instruct Lora 微调-运行报错 #495

Description

出bug的具体模型

出bug的具体模型教程

教程负责人

Bug描述

复现步骤

使用 Trainer 训练

期望行为

环境信息

其他信息

确认事项 / Verification

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions