Skip to content

Qwen3.6-35B-A3B Fine-tuning Single GPU Batch Size Limitation #159

@Rachal28

Description

@Rachal28

Description

I encountered the following error while fine-tuning Qwen3.6-35B-A3B:ValueError:
The batch size is expected to be 1 rather than 8 when using cu_seqlens. Please flatten variable-length inputs before processing.

Reproduction

model_name_or_path: Qwen3.6-35B-A3B
trust_remote_code: true

stage: sft
do_train: true
finetuning_type: full
optim: adamw_torch
gradient_checkpointing: true
deepspeed:deepspeed/ds_z3_config.json
flash_attn: fa2

dataset_dir: LlamaFactory/data
dataset: user_data
template: qwen3_6
cutoff_len: 20480
max_samples: 150000
overwrite_cache: false
preprocessing_num_workers: 16
dataloader_num_workers: 4

output_dir: saves/Qwen3.6-35B-A3B
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]

per_device_train_batch_size: 8
gradient_accumulation_steps: 1
learning_rate: 1.0e-5
num_train_epochs: 2.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

Logs

Environment Information

--LlamaFactory: Source code (local clone)
--Model: Qwen3.6-35B-A3B

Known Issue

  • The issue hasn't been already addressed in Documentation, Issues, and Discussions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions