This folder contains the code and instructions for reproducing the experiments on Qwen1.5-MoE-A2.7B and Qwen3-30B-A3B-Base as described in the DenseMixer blog post. All experiments are conducted using llama-factory.
Refer to installation.sh for detailed setup instructions.
We use a diverse set of datasets for training and evaluation. Our pre-processed datasets are available on Hugging Face.
For Qwen1.5-MoE-A2.7B, we use GSM (math reasoning), CodeAlpaca (code generation), ESFT-intent (intent understanding), ESFT-law (legal reasoning), ESFT-summary (summarization), ESFT-translation (translation). For code generation evaluation, we use MBPP and HumanEval.
For Qwen3-30B-A3B-Base, we use s1 (math reasoning), and nemotron-code (coding reasoning). For evaluation, we challenging math and coding benchmarks that require reasoning abilities.
We support the following fine-tuning methods. For each method, replace {dataset_name} with your target dataset (gsm, codealpaca,esft).
Full Fine-tuning & LoRA
cd LLaMA-Factory
export WANDB_API_KEY="YOUR_WANDB_API_KEY"
bash run/qwen1.5/frozen_full/train_{dataset_name}_frozen.sh
bash run/qwen1.5/frozen_lora/train_{dataset_name}.shFull Fine-tuning & LoRA
cd LLaMA-Factory
export WANDB_API_KEY="YOUR_WANDB_API_KEY"
bash run/qwen1.5/conventional_full/train_{dataset_name}.sh
bash run/qwen1.5/conventional_lora/train_{dataset_name}_unfrozen.sh
###qwen3-30b
bash run/qwen3/train_full-code.sh
bash run/qwen3/train_full-math.shTo run DenseMixer, you need one additional setup
pip install densemixer
densemixer setupFull Fine-tuning & LoRA
cd LLaMA-Factory
export WANDB_API_KEY="YOUR_WANDB_API_KEY"
bash run/qwen1.5/densemixer_full/train_{dataset_name}_densemixer.sh
bash run/qwen1.5/densemixer_lora/train_{dataset_name}_densemixer.sh
###qwen3-30b
bash run/qwen3/train_densemixer-code.sh
bash run/qwen3/train_densemixer-math.shBefore running ESFT, generate expert configs:
cd esft_experts_gen/experts
bash get_expert_scores.sh
bash generate_expert_config.shESFT-Gate (Selects experts by Average Gate Score)
bash run/qwen1.5/esft-gate/train_gsm.sh
bash run/qwen1.5/esft-gate/train_code.sh
bash run/qwen1.5/esft-gate/train_esft.sh
bash run/qwen1.5/esft-gate/train_amthinking_math.shESFT-Token (Selects experts by Token Selection Ratio)
bash run/qwen1.5/esft-token/train_gsm.sh
bash run/qwen1.5/esft-token/train_code.sh
bash run/qwen1.5/esft-token/train_esft.sh
bash run/qwen1.5/esft-token/train_amthinking_math.shImplementation Details:
- LLaMA-Factory framework:
LLaMA-Factory- Training scripts:
LLaMA-Factory/run- Configuration files:
LLaMA-Factory/examples
Evaluation scripts are in the eval/ directory.
See eval/README.md for details and environment setup.
Key arguments to modify for your trained model:
--save_dir--model_name_or_path--tokenizer_name_or_path