-
Notifications
You must be signed in to change notification settings - Fork 92
Open
Description
When fine-tuning the v1p1 model, a tensor size mismatch occurs in the connector.
[rank0]: Traceback (most recent call last):
[rank0]: File "/data/vjuicefs_ai_camera_jgroup_llm/11188630/Step1X-Edit/finetuning.py", line 563, in <module>
[rank0]: trainer.train(args)
[rank0]: File "/data/vjuicefs_ai_camera_jgroup_llm/11188630/Step1X-Edit/library/kohya_trainer.py", line 1404, in train
[rank0]: loss = self.process_batch(
[rank0]: File "/data/vjuicefs_ai_camera_jgroup_llm/11188630/Step1X-Edit/library/kohya_trainer.py", line 444, in process_batch
[rank0]: noise_pred, target, timesteps, weighting = self.get_noise_pred_and_target(
[rank0]: File "/data/vjuicefs_ai_camera_jgroup_llm/11188630/Step1X-Edit/finetuning.py", line 378, in get_noise_pred_and_target
[rank0]: model_pred = unet(
[rank0]: File "/root/anaconda3/envs/stepedit_sft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/stepedit_sft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/stepedit_sft/lib/python3.10/site-packages/accelerate/utils/operations.py", line 820, in forward
[rank0]: return model_forward(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/stepedit_sft/lib/python3.10/site-packages/accelerate/utils/operations.py", line 808, in __call__
[rank0]: return convert_to_fp32(self.model_forward(*args, **kwargs))
[rank0]: File "/root/anaconda3/envs/stepedit_sft/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
[rank0]: return func(*args, **kwargs)
[rank0]: File "/data/vjuicefs_ai_camera_jgroup_llm/11188630/Step1X-Edit/modules/model_edit.py", line 211, in forward
[rank0]: txt, y = self.connector(
[rank0]: File "/root/anaconda3/envs/stepedit_sft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/stepedit_sft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/data/vjuicefs_ai_camera_jgroup_llm/11188630/Step1X-Edit/modules/connector_edit.py", line 493, in forward
[rank0]: global_out=self.global_proj_out(x_mean)
[rank0]: File "/root/anaconda3/envs/stepedit_sft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/stepedit_sft/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/stepedit_sft/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 125, in forward
[rank0]: return F.linear(input, self.weight, self.bias)
[rank0]: RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x2048 and 3584x768)
Here is my script:
accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 --num_processes 1 \
--config_file ./library/accelerate_config.yaml \
finetuning.py \
--pretrained_model_name_or_path "/data/Models/Step1X-Edit/step1x-edit-i1258.safetensors" \
--qwen2p5vl "/data/Models/Qwen2.5-VL-7B-Instruct" \
--ae /data/Step1X-Edit/vae.safetensors \
--cache_latents_to_disk --save_model_as safetensors --sdpa --persistent_data_loader_workers \
--max_data_loader_n_workers 2 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 \
--network_module library.lora_module --network_dim 64 --network_alpha 32 --network_train_unet_only \
--optimizer_type adamw8bit --learning_rate 1e-4 \
--cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk \
--highvram --max_train_epochs 100 --save_every_n_epochs 5 --dataset_config library/data_configs/step1x_edit_test.toml \
--output_dir /data/Models/Step1X-Edit-SFT \
--output_name step1x-edit-maze \
--timestep_sampling shift --discrete_flow_shift 3.1582 --model_prediction_type raw --guidance_scale 1.0 --fp8_base
And dataset config:
[general]
shuffle_caption = false
caption_extension = ".txt"
keep_tokens = 1
# This is a edit dataset
[[datasets]]
resolution = [512, 512]
batch_size = 1
edit_dataset = true # necessary for editing tasks
[[datasets.subsets]]
image_dir = "/data/DiffThinker/Step1X-SFT/8_train"
metadata_file = "/data/DiffThinker/Step1X-SFT/8_train_metadata_test.json"
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels