[Bug]: sft-lora微调qwen3-vl报错:KeyError: '1.self_attn.o_proj.lora_A'

### 软件环境

```Markdown
- paddlepaddle:0.1.0
- paddlepaddle-gpu: 3.3.0
- paddleformers: 1.0.1.post20260303
- paddlefleet:0.1.0
```

### 重复问题

- [x] I have searched the existing issues

### 错误描述

```Markdown
我的qwen3-vl训练配置配置文件是由paddleocr-vl的sft-lora训练配置文件修改得到的:
### data
train_dataset_type: messages
eval_dataset_type: messages
train_dataset_path: /work/train_data/qwen3-vl/vlm_train.jsonl
train_dataset_prob: "1.0"
eval_dataset_path: /work/train_data/qwen3-vl/vlm_train.jsonl
eval_dataset_prob: "1.0"
max_seq_len: 16384
padding_free: True
truncate_packing: False
dataloader_num_workers: 8
mix_strategy: concat
template_backend: custom
template: qwen3_vl

### model
model_name_or_path: /work/models/qwen3-vl-4b-instruct/
attn_impl: flashmask
lora: true
lora_rank: 8

### finetuning
# base
stage: VL-SFT
fine_tuning: lora
seed: 23
do_train: true
#do_eval: true
per_device_eval_batch_size: 8
per_device_train_batch_size: 8
num_train_epochs: 200
max_steps: -1
#max_estimate_samples: 500
#eval_steps: 400
#evaluation_strategy: steps
save_steps: 400
save_strategy: steps
logging_steps: 2
gradient_accumulation_steps: 8
logging_dir: /work/output/visualdl_logs/
output_dir: /work/output/
disable_tqdm: true
#eval_accumulation_steps: 16

# train
lr_scheduler_type: cosine
warmup_ratio: 0.01
learning_rate: 5.0e-4
min_lr: 5.0e-5

# optimizer
weight_decay: 0.1
adam_epsilon: 1.0e-8
adam_beta1: 0.9
adam_beta2: 0.95

# performance
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
sharding: stage1
recompute_granularity: full
recompute_method: uniform
recompute_num_layers: 1
bf16: true
fp16_opt_level: O2
pre_alloc_memory: 21

# save
unified_checkpoint: false
save_checkpoint_format: "flex_checkpoint"
load_checkpoint_format: "flex_checkpoint"

训练开始后遇到报错：
LAUNCH INFO 2026-03-04 15:44:31,630 ------------------------- ERROR LOG DE                                                                                           TAIL -------------------------
[    INFO] resharder.py:350 - ReadItem generation completed, with a total                                                                                            of 4589.
[2026-03-04 15:44:30,275] [    INFO] - Using download source: huggingface
[2026-03-04 15:44:30,377] [    INFO] - Using download source: huggingface
[2026-03-04 15:44:30,377] [    INFO] - loading configuration file /work/mo                                                                                           dels/qwen3-vl-4b-instruct/preprocessor_config.json
[2026-03-04 15:44:30,378] [    INFO] - Using download source: huggingface
[2026-03-04 15:44:30,378] [    INFO] - loading configuration file None
[2026-03-04 15:44:30,378] [    INFO] - Using download source: huggingface
[2026-03-04 15:44:30,378] [    INFO] - loading configuration file /work/mo                                                                                           dels/qwen3-vl-4b-instruct/preprocessor_config.json
[2026-03-04 15:44:30,378] [ WARNING] - The model's image processor only su                                                                                           pports the slow version (`use_fast=False`). Detected `use_fast=True` but w                                                                                           ill fall back to the slow version: 'Qwen2VLImageProcessorFast' will be loa                                                                                           ded as 'Qwen2VLImageProcessor'.
[2026-03-04 15:44:30,380] [    INFO] - Using download source: huggingface
[2026-03-04 15:44:30,487] [    INFO] - Using download source: huggingface
[2026-03-04 15:44:30,488] [    INFO] - loading configuration file /work/mo                                                                                           dels/qwen3-vl-4b-instruct/video_preprocessor_config.json
[2026-03-04 15:44:30,524] [ WARNING] - Reset tensor_model_parallel_size of                                                                                            lora_config to 1.
[2026-03-04 15:44:30,524] [    INFO] - Mark only lora and trainable_module                                                                                            as trainable.
Traceback (most recent call last):
  File "/work/PaddleFormers-release-v1.0/paddleformers/cli/launcher.py", l                                                                                           ine 40, in <module>
    launch()
  File "/work/PaddleFormers-release-v1.0/paddleformers/cli/launcher.py", l                                                                                           ine 32, in launch
    run_tuner()
  File "/work/PaddleFormers-release-v1.0/paddleformers/cli/train/tuner.py"                                                                                           , line 79, in run_tuner
    _training_function(config={"args": args})
  File "/work/PaddleFormers-release-v1.0/paddleformers/cli/train/tuner.py"                                                                                           , line 53, in _training_function
    run_sft(model_args, data_args, generating_args, finetuning_args)
  File "/work/PaddleFormers-release-v1.0/paddleformers/cli/train/sft/workf                                                                                           low.py", line 398, in run_sft
    model = create_peft_model(model_args, training_args, dtype, model)
  File "/work/PaddleFormers-release-v1.0/paddleformers/cli/train/sft/workf                                                                                           low.py", line 579, in create_peft_model
    model = LoRAModel(model, lora_config)
  File "/work/PaddleFormers-release-v1.0/paddleformers/peft/lora/lora_mode                                                                                           l.py", line 231, in __init__
    self.mark_only_lora_as_trainable()
  File "/work/PaddleFormers-release-v1.0/paddleformers/peft/lora/lora_mode                                                                                           l.py", line 1016, in mark_only_lora_as_trainable
    for name, weight in layer.state_dict().items():
  File "/usr/local/lib/python3.10/dist-packages/paddlefleet/models/gpt/gpt                                                                                           _model.py", line 505, in state_dict
    state_dict[self._pp_to_single_mapping[k]] = v
KeyError: '1.self_attn.o_proj.lora_A'
LAUNCH INFO 2026-03-04 15:44:31,630 Exit code 1

能帮我分析一下原因吗？顺便我想知道qwen3-vl有没有提供可供训练的yaml配置？
```

### 稳定复现步骤 & 代码

paddleformers-cli train QWen3-vl/qwen3-vl_lora_4b_instruct.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: sft-lora微调qwen3-vl报错:KeyError: '1.self_attn.o_proj.lora_A' #3990

软件环境

重复问题

错误描述

稳定复现步骤 & 代码

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: sft-lora微调qwen3-vl报错:KeyError: '1.self_attn.o_proj.lora_A' #3990

Description

软件环境

重复问题

错误描述

稳定复现步骤 & 代码

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions