MixtureKit 🚀

MixtureKit package provides high-level helper functions to merge pretrained and finetuned models into a unified framework, integrating Mixture-of-Experts (MoE) advanced architectures.

✨ Highlights

One‑line merge of multiple HF checkpoints into a single MoE model.
Supports Branch‑Train‑MiX (BTX), Branch‑Train‑Stitch (BTS) and vanilla MoE.
Built‑in routing visualizer: inspect which tokens each expert receives — overall (coarse‑grained) and per layer (fine‑grained). See examples/README_vis.md for details.

Installation

# Create a fresh conda environment (recommended)
conda create -n mixturekit python=3.12
conda activate mixturekit

# clone & install in editable mode for development
git clone https://github.com/MBZUAI-Paris/MixtureKit
cd MixtureKit
pip install -e .

Requirements: Python ≥ 3.10 · PyTorch ≥ 2.5. The correct version of transformers is pulled automatically.

Quick start

The script below builds a BTX MoE that routes tokens between a Gemma‑4B base model and two specialized fine-tuned experts (FrenchGemma for French Language and MedGemma for Health Information). For BTS or vanilla architectures, change the moe_method to BTS and traditional respectively. For other model families, comment the model_cls.

# From the repo root
python examples/example_build_moe.py

What happens under the hood?

A config dictionary is created that lists the base expert, two additional experts, the routing layers, etc.
MixtureKit.build_moe() merges the checkpoints and writes the MoE to models_merge/gemmax/.
The script reloads the model with AutoModelForCausalLM and prints a parameter‑breakdown table — only router weights stay trainable.

🔧 Fine-tune / Supervised-Fine-Tuning (SFT)

The mixture_training/ folder contains a ready-to-go scaffold that trains any merged MoE checkpoint with LoRA-adapters (BTX or BTS).

mixture_training/
├── config_training.yaml      # all hyper-params in one place
├── deepspeed_config.yaml     # ZeRO-3 config
├── requirements.txt          # extra libs (trl, deepspeed, wandb, etc.)
└── train_model.py            # launch-script

1. Prepare your data

Expected format: 🤗 datasets arrow table saved on disk and loaded with load_from_disk().
config_training.yaml assumes:
- a column called messages (list of chat turns),
- each turn is a dict {"role": "...", "content": "..."} (same schema as ShareGPT).

3. Edit `config_training.yaml`

Minimal edits for your own run:

Key	What it does
`dataset_path`	Path to the dataset produced in step 2
`model_id`	Path or HF-Hub id of themerged MoE (e.g. `models_merge/gemmax`)
`output_dir`	Where to write checkpoints / LoRA adapters
`run_name`	Friendly name shown in 🤗`wandb` / logs

4. Launch single-node training

accelerate launch --config_file mixture_training/deepspeed_config.yaml mixture_training/train_model.py

The script will:

Load the MoE checkpoint in bf16 with distributed training if multi GPUs are available,
Train with 🤗 trl’s SFTTrainer,
Save incremental checkpoints to the local directory output_dir.

Tip: To switch from BTX/Traditional to BTS finetuning, open config_training.yaml and set is_bts to True.

Example gallery

The file examples/config_examples.txt contains more ready‑to‑use configs. Copy one into a small script and call build_moe().

Key	Scenario	MoE flavour
`llama3x`	Two Llama‑3‑1B experts	BTX
`qwen3x`	Three Qwen‑3‑0.6B experts	Traditional MoE
`gemmabts`	Gemma + 2 Gemma-experts	BTS (layer‑stitching)

Documentation

API reference — open docs/index.html or visit the online version.

Contributing 🤝

Pull requests are welcome! Please open an issue first to discuss your ideas.

License

MixtureKit is released under the BSD 3-Clause License — see the LICENSE file for details.

Paper

MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models

Happy mixing! 🎛️

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
MixtureKit		MixtureKit
doc_configuration		doc_configuration
docs		docs
examples		examples
mixture_training		mixture_training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
logo.png		logo.png
pyproject.toml		pyproject.toml
vis.py		vis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MixtureKit 🚀

✨ Highlights

Installation

Quick start

1. Prepare your data

3. Edit `config_training.yaml`

4. Launch single-node training

Example gallery

Documentation

Contributing 🤝

License

Paper

About

Uh oh!

Releases 1

Packages

Languages

License

MBZUAI-Paris/MixtureKit

Folders and files

Latest commit

History

Repository files navigation

MixtureKit 🚀

✨ Highlights

Installation

Quick start

1. Prepare your data

3. Edit config_training.yaml

4. Launch single-node training

Example gallery

Documentation

Contributing 🤝

License

Paper

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

3. Edit `config_training.yaml`

Packages