Train Qwen Next 80B-A3B-Thinking

We need to investigate training Qwen Next 80B-A3B-Thinking model on RTX pro machine.
1. verl
verl workin well with None MoE models, but for MoE models, verl unable to handle backward correctly

<img width="1336" height="453" alt="Image" src="https://github.com/user-attachments/assets/b7718ce6-fce6-492b-8337-e4ed027413ab" />

2. LLAMA Factory

LLama-Factory also supported training MoE models, for Qwen Next, we can only train qLoRA, other options like LoRA and full finetune is OOM. Even with qLoRA it also take very long time to train (related issue https://github.com/hiyouga/LLaMA-Factory/issues/9178#issuecomment-3322156933)

I'm following https://github.com/hiyouga/LLaMA-Factory/issues/9178#issuecomment-3339561602 suggestion to improve the training speed of qwen-next but still get OOM, even with qLoRA, I will continue to investigate on this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train Qwen Next 80B-A3B-Thinking #229

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Train Qwen Next 80B-A3B-Thinking #229

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions