Qwen3.6-35B-A3B Fine-tuning Single GPU Batch Size Limitation

### Description

I encountered the following error while fine-tuning Qwen3.6-35B-A3B:ValueError: 
The batch size is expected to be 1 rather than 8 when using cu_seqlens. Please flatten variable-length inputs before processing.

### Reproduction


model_name_or_path: Qwen3.6-35B-A3B
trust_remote_code: true

stage: sft
do_train: true
finetuning_type: full
optim: adamw_torch
gradient_checkpointing: true
deepspeed:deepspeed/ds_z3_config.json
flash_attn: fa2

dataset_dir: LlamaFactory/data
dataset: user_data
template: qwen3_6
cutoff_len: 20480
max_samples: 150000
overwrite_cache: false
preprocessing_num_workers: 16
dataloader_num_workers: 4

output_dir: saves/Qwen3.6-35B-A3B
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]

per_device_train_batch_size: 8
gradient_accumulation_steps: 1
learning_rate: 1.0e-5
num_train_epochs: 2.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000


### Logs

```shell

```

### Environment Information

--LlamaFactory: Source code (local clone)
--Model: Qwen3.6-35B-A3B

### Known Issue

- [x] The issue hasn't been already addressed in Documentation, Issues, and Discussions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3.6-35B-A3B Fine-tuning Single GPU Batch Size Limitation #159

Description

Reproduction

Logs

Environment Information

Known Issue

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Qwen3.6-35B-A3B Fine-tuning Single GPU Batch Size Limitation #159

Description

Description

Reproduction

Logs

Environment Information

Known Issue

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions