Skip to content

Size Mismatch Error #141

@WangYijun-OUC

Description

@WangYijun-OUC

When fine-tuning the v1p1 model, a tensor size mismatch occurs in the connector.

[rank0]: Traceback (most recent call last):
[rank0]:   File "/data/vjuicefs_ai_camera_jgroup_llm/11188630/Step1X-Edit/finetuning.py", line 563, in <module>
[rank0]:     trainer.train(args)
[rank0]:   File "/data/vjuicefs_ai_camera_jgroup_llm/11188630/Step1X-Edit/library/kohya_trainer.py", line 1404, in train
[rank0]:     loss = self.process_batch(
[rank0]:   File "/data/vjuicefs_ai_camera_jgroup_llm/11188630/Step1X-Edit/library/kohya_trainer.py", line 444, in process_batch
[rank0]:     noise_pred, target, timesteps, weighting = self.get_noise_pred_and_target(
[rank0]:   File "/data/vjuicefs_ai_camera_jgroup_llm/11188630/Step1X-Edit/finetuning.py", line 378, in get_noise_pred_and_target
[rank0]:     model_pred = unet(
[rank0]:   File "/root/anaconda3/envs/stepedit_sft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/root/anaconda3/envs/stepedit_sft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/root/anaconda3/envs/stepedit_sft/lib/python3.10/site-packages/accelerate/utils/operations.py", line 820, in forward
[rank0]:     return model_forward(*args, **kwargs)
[rank0]:   File "/root/anaconda3/envs/stepedit_sft/lib/python3.10/site-packages/accelerate/utils/operations.py", line 808, in __call__
[rank0]:     return convert_to_fp32(self.model_forward(*args, **kwargs))
[rank0]:   File "/root/anaconda3/envs/stepedit_sft/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/data/vjuicefs_ai_camera_jgroup_llm/11188630/Step1X-Edit/modules/model_edit.py", line 211, in forward
[rank0]:     txt, y = self.connector(
[rank0]:   File "/root/anaconda3/envs/stepedit_sft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/root/anaconda3/envs/stepedit_sft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/data/vjuicefs_ai_camera_jgroup_llm/11188630/Step1X-Edit/modules/connector_edit.py", line 493, in forward
[rank0]:     global_out=self.global_proj_out(x_mean)
[rank0]:   File "/root/anaconda3/envs/stepedit_sft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/root/anaconda3/envs/stepedit_sft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/root/anaconda3/envs/stepedit_sft/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 125, in forward
[rank0]:     return F.linear(input, self.weight, self.bias)
[rank0]: RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x2048 and 3584x768)

Here is my script:

accelerate launch  --mixed_precision bf16 --num_cpu_threads_per_process 1 --num_processes 1 \
--config_file ./library/accelerate_config.yaml \
finetuning.py \
--pretrained_model_name_or_path "/data/Models/Step1X-Edit/step1x-edit-i1258.safetensors" \
--qwen2p5vl "/data/Models/Qwen2.5-VL-7B-Instruct" \
--ae /data/Step1X-Edit/vae.safetensors \
--cache_latents_to_disk --save_model_as safetensors --sdpa --persistent_data_loader_workers \
--max_data_loader_n_workers 2 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 \
--network_module library.lora_module --network_dim 64 --network_alpha 32 --network_train_unet_only \
--optimizer_type adamw8bit --learning_rate 1e-4 \
--cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk \
--highvram --max_train_epochs 100 --save_every_n_epochs 5 --dataset_config library/data_configs/step1x_edit_test.toml \
--output_dir /data/Models/Step1X-Edit-SFT \
--output_name step1x-edit-maze \
--timestep_sampling shift --discrete_flow_shift 3.1582 --model_prediction_type raw --guidance_scale 1.0 --fp8_base

And dataset config:

[general]
shuffle_caption = false
caption_extension = ".txt"
keep_tokens = 1

# This is a edit dataset
[[datasets]]
resolution = [512, 512]
batch_size = 1
edit_dataset = true # necessary for editing tasks

  [[datasets.subsets]]
  image_dir = "/data/DiffThinker/Step1X-SFT/8_train"
  metadata_file = "/data/DiffThinker/Step1X-SFT/8_train_metadata_test.json"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions