Description
I encountered the following error while fine-tuning Qwen3.6-35B-A3B:ValueError:
The batch size is expected to be 1 rather than 8 when using cu_seqlens. Please flatten variable-length inputs before processing.
Reproduction
model_name_or_path: Qwen3.6-35B-A3B
trust_remote_code: true
stage: sft
do_train: true
finetuning_type: full
optim: adamw_torch
gradient_checkpointing: true
deepspeed:deepspeed/ds_z3_config.json
flash_attn: fa2
dataset_dir: LlamaFactory/data
dataset: user_data
template: qwen3_6
cutoff_len: 20480
max_samples: 150000
overwrite_cache: false
preprocessing_num_workers: 16
dataloader_num_workers: 4
output_dir: saves/Qwen3.6-35B-A3B
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
per_device_train_batch_size: 8
gradient_accumulation_steps: 1
learning_rate: 1.0e-5
num_train_epochs: 2.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
Logs
Environment Information
--LlamaFactory: Source code (local clone)
--Model: Qwen3.6-35B-A3B
Known Issue
Description
I encountered the following error while fine-tuning Qwen3.6-35B-A3B:ValueError:
The batch size is expected to be 1 rather than 8 when using cu_seqlens. Please flatten variable-length inputs before processing.
Reproduction
model_name_or_path: Qwen3.6-35B-A3B
trust_remote_code: true
stage: sft
do_train: true
finetuning_type: full
optim: adamw_torch
gradient_checkpointing: true
deepspeed:deepspeed/ds_z3_config.json
flash_attn: fa2
dataset_dir: LlamaFactory/data
dataset: user_data
template: qwen3_6
cutoff_len: 20480
max_samples: 150000
overwrite_cache: false
preprocessing_num_workers: 16
dataloader_num_workers: 4
output_dir: saves/Qwen3.6-35B-A3B
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
per_device_train_batch_size: 8
gradient_accumulation_steps: 1
learning_rate: 1.0e-5
num_train_epochs: 2.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
Logs
Environment Information
--LlamaFactory: Source code (local clone)
--Model: Qwen3.6-35B-A3B
Known Issue