Name	Name	Last commit message	Last commit date
parent directory ..
LLaMA-Factory	LLaMA-Factory
esft_experts_gen/experts	esft_experts_gen/experts
eval	eval
README.md	README.md
installation.sh	installation.sh

Post-Training for Qwen1.5-MoE & Qwen3-MoE

This folder contains the code and instructions for reproducing the experiments on Qwen1.5-MoE-A2.7B and Qwen3-30B-A3B-Base as described in the DenseMixer blog post. All experiments are conducted using llama-factory.

Environment Setup
Data Preparation
Training
Evaluation
References

Environment Setup

Refer to installation.sh for detailed setup instructions.

Data Preparation

We use a diverse set of datasets for training and evaluation. Our pre-processed datasets are available on Hugging Face.

For Qwen1.5-MoE-A2.7B, we use GSM (math reasoning), CodeAlpaca (code generation), ESFT-intent (intent understanding), ESFT-law (legal reasoning), ESFT-summary (summarization), ESFT-translation (translation). For code generation evaluation, we use MBPP and HumanEval.

For Qwen3-30B-A3B-Base, we use s1 (math reasoning), and nemotron-code (coding reasoning). For evaluation, we challenging math and coding benchmarks that require reasoning abilities.

Training

We support the following fine-tuning methods. For each method, replace {dataset_name} with your target dataset (gsm, codealpaca,esft).

1. Frozen Router

Full Fine-tuning & LoRA

cd LLaMA-Factory
export WANDB_API_KEY="YOUR_WANDB_API_KEY"
bash run/qwen1.5/frozen_full/train_{dataset_name}_frozen.sh
bash run/qwen1.5/frozen_lora/train_{dataset_name}.sh

2. Conventional Training

Full Fine-tuning & LoRA

cd LLaMA-Factory
export WANDB_API_KEY="YOUR_WANDB_API_KEY"
bash run/qwen1.5/conventional_full/train_{dataset_name}.sh
bash run/qwen1.5/conventional_lora/train_{dataset_name}_unfrozen.sh
###qwen3-30b
bash run/qwen3/train_full-code.sh
bash run/qwen3/train_full-math.sh

3. DenseMixer

To run DenseMixer, you need one additional setup

pip install densemixer
densemixer setup

Full Fine-tuning & LoRA

cd LLaMA-Factory
export WANDB_API_KEY="YOUR_WANDB_API_KEY"
bash run/qwen1.5/densemixer_full/train_{dataset_name}_densemixer.sh
bash run/qwen1.5/densemixer_lora/train_{dataset_name}_densemixer.sh
###qwen3-30b
bash run/qwen3/train_densemixer-code.sh
bash run/qwen3/train_densemixer-math.sh

4. ESFT Fine-tuning

Before running ESFT, generate expert configs:

cd esft_experts_gen/experts
bash get_expert_scores.sh
bash generate_expert_config.sh

ESFT-Gate (Selects experts by Average Gate Score)

bash run/qwen1.5/esft-gate/train_gsm.sh
bash run/qwen1.5/esft-gate/train_code.sh
bash run/qwen1.5/esft-gate/train_esft.sh
bash run/qwen1.5/esft-gate/train_amthinking_math.sh

ESFT-Token (Selects experts by Token Selection Ratio)

bash run/qwen1.5/esft-token/train_gsm.sh
bash run/qwen1.5/esft-token/train_code.sh
bash run/qwen1.5/esft-token/train_esft.sh
bash run/qwen1.5/esft-token/train_amthinking_math.sh

Implementation Details:

LLaMA-Factory framework: LLaMA-Factory

Training scripts: LLaMA-Factory/run

Configuration files: LLaMA-Factory/examples

Evaluation

Evaluation scripts are in the eval/ directory. See eval/README.md for details and environment setup.

Key arguments to modify for your trained model:

--save_dir
--model_name_or_path
--tokenizer_name_or_path

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Post-Training for Qwen1.5-MoE & Qwen3-MoE

Table of Contents

Environment Setup

Data Preparation

Training

1. Frozen Router

2. Conventional Training

3. DenseMixer

4. ESFT Fine-tuning

Evaluation

References

FilesExpand file tree

llama-factory

Directory actions

More options

Directory actions

More options

Latest commit

History

llama-factory

Folders and files

parent directory

README.md

Post-Training for Qwen1.5-MoE & Qwen3-MoE

Table of Contents

Environment Setup

Data Preparation

Training

1. Frozen Router

2. Conventional Training

3. DenseMixer

4. ESFT Fine-tuning

Evaluation

References