This repository provides scripts for:
- Collecting expert router statistics (Exponential Moving Average - EMA & hit counts) from Qwen Mixture-of-Experts (MoE) models.
- Generating text with forced expert pools or analyzing expert subsets in MLX-based Qwen3 MoE models.
Blog post: https://blog.sionic.ai/qwen3-moe-strategy
Prepare a Python 3.10 environment, create a virtual environment, and install dependencies.
# Example for Ubuntu: install Python 3.10
sudo apt update
sudo apt install python3.10 python3.10-venv python3.10-dev
# Create and activate a virtual environment
python3.10 -m venv venv
source venv/bin/activate
# Upgrade pip and install dependencies
pip install --upgrade pip
pip install -r requirements.txt
Below are examples of how to run each script:
Script File | Description | Example |
---|---|---|
analyze-ko.py |
Collects layer-wise expert activation statistics (EMA & hit counts) for all prompts vs. Korean-only prompts using a vLLM-based Qwen MoE model. | ```bash |
python analyze-ko.py \ | ||
--model kalomaze/Qwen3-16B-A3B \ | ||
--tokens 128 --decay 0.99999 \ | ||
--output stats --prompts_file ko_prompts.txt |
| **`analyze-mlx.py`** | Patches an MLX Qwen3 MoE model to gather per-layer expert statistics (EMA & hit counts) and prints tables. | ```bash
python analyze-mlx.py \
--model mlx-community/Qwen3-235B-A22B-8bit \
--tokens 128 --decay 0.99999 --output stats_mlx \
"Hello" "How's the weather?" "Tell me about Korean history"
``` |
## 3. Summary of Analysis Scripts
The table below summarizes key functions and output files of the analysis scripts:
| Script | Key Features | Output Files |
|-------------------|--------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|
| **`analyze-ko.py`** | - Hooks into a vLLM-based Qwen MoE model to collect expert activation counts and EMA for all vs. Korean-only prompts. | - `stats/all/routing_stats_layer_{layer}_all.txt`<br>- `stats/ko/routing_stats_layer_{layer}_ko.txt` |
| **`analyze-mlx.py`**| - Patches MLX Qwen3 MoE layers to collect per-layer expert activation counts and EMA.<br>- Generates console tables and writes layer-wise and aggregated stats files. | - `stats_mlx/routing_stats_layer_{layer}.txt`<br>- `stats_mlx/routing_stats_aggregated.txt` |
## 4. Example Analysis Outputs
### 4.1. Per-Layer Expert Statistics (Example from `stats/stats_Qwen3MoeSparseMoeBlock.txt`)
This section shows an example output from one of the analysis scripts, displaying statistics for a single MoE layer.
```shell
(qwen-moe) sionic@vessl-a100-sxm-02:~/sigrid/qwen-moe$ cat stats/stats_Qwen3MoeSparseMoeBlock.txt
Layer: Qwen3MoeSparseMoeBlock
Avg forward time: 45.36 ms
Avg gating+stats time: 0.45 ms
EMA(decay=0.99999):
Expert 22: 0.14%
Expert 104: 0.13%
Expert 75: 0.12%
Expert 33: 0.12%
Expert 101: 0.11%
Expert 1: 0.10%
Expert 7: 0.10%
Expert 9: 0.10%
Expert 25: 0.09%
Expert 20: 0.09%
Expert 18: 0.09%
Expert 42: 0.09%
Expert 97: 0.09%
Expert 86: 0.09%
Expert 125: 0.09%
Expert 56: 0.08%
Expert 38: 0.08%
Expert 24: 0.08%
Expert 120: 0.08%
Expert 44: 0.08%
Expert 102: 0.07%
Expert 0: 0.07%
Expert 31: 0.07%
Expert 23: 0.07%
Expert 64: 0.07%
Expert 3: 0.07%
Expert 82: 0.07%
Expert 13: 0.06%
Expert 35: 0.06%
Expert 4: 0.06%
Expert 96: 0.06%
Expert 83: 0.06%
Expert 17: 0.06%
Expert 93: 0.06%
Expert 71: 0.06%
Expert 29: 0.06%
Expert 45: 0.06%
Expert 116: 0.06%
Expert 50: 0.06%
Expert 69: 0.05%
Expert 48: 0.05%
Expert 55: 0.05%
Expert 10: 0.05%
Expert 91: 0.05%
Expert 16: 0.05%
Expert 110: 0.05%
Expert 100: 0.05%
Expert 107: 0.05%
Expert 94: 0.05%
Expert 115: 0.05%
Expert 52: 0.05%
Expert 118: 0.05%
Expert 121: 0.05%
Expert 27: 0.05%
Expert 5: 0.05%
Expert 84: 0.05%
Expert 77: 0.05%
Expert 14: 0.05%
Expert 59: 0.04%
Expert 60: 0.04%
Expert 108: 0.04%
Expert 40: 0.04%
Expert 6: 0.04%
Expert 54: 0.04%
Expert 2: 0.04%
Expert 106: 0.04%
Expert 73: 0.04%
Expert 51: 0.04%
Expert 98: 0.04%
Expert 30: 0.04%
Expert 32: 0.04%
Expert 61: 0.04%
Expert 37: 0.04%
Expert 72: 0.04%
Expert 90: 0.04%
Expert 99: 0.04%
Expert 112: 0.04%
Expert 53: 0.04%
Expert 66: 0.04%
Expert 85: 0.04%
Expert 88: 0.04%
Expert 15: 0.04%
Expert 36: 0.04%
Expert 62: 0.03%
Expert 26: 0.03%
Expert 89: 0.03%
Expert 113: 0.03%
Expert 21: 0.03%
Expert 68: 0.03%
Expert 39: 0.03%
Expert 41: 0.03%
Expert 43: 0.03%
Expert 87: 0.03%
Expert 114: 0.03%
Expert 49: 0.03%
Expert 127: 0.03%
Expert 105: 0.03%
Expert 63: 0.03%
Expert 95: 0.03%
Expert 8: 0.03%
Expert 124: 0.03%
Expert 92: 0.03%
Expert 57: 0.03%
Expert 28: 0.02%
Expert 123: 0.02%
Expert 78: 0.02%
Expert 122: 0.02%
Expert 65: 0.02%
Expert 109: 0.02%
Expert 80: 0.02%
Expert 79: 0.02%
Expert 74: 0.02%
Expert 58: 0.02%
Expert 76: 0.02%
Expert 119: 0.02%
Expert 46: 0.02%
Expert 103: 0.02%
Expert 111: 0.02%
Expert 67: 0.02%
Expert 12: 0.02%
Expert 11: 0.02%
Expert 34: 0.02%
Expert 126: 0.01%
Expert 117: 0.01%
Expert 19: 0.01%
Expert 81: 0.01%
Expert 70: 0.01%
Expert 47: 0.01%
Total tokens per expert:
Expert 0: 587
Expert 1: 822
Expert 2: 350
Expert 3: 545
Expert 4: 521
Expert 5: 380
Expert 6: 347
Expert 7: 799
Expert 8: 224
Expert 9: 781
Expert 10: 440
Expert 11: 132
Expert 12: 137
Expert 13: 525
Expert 14: 371
Expert 15: 290
Expert 16: 432
Expert 17: 491
Expert 18: 715
Expert 19: 98
Expert 20: 730
Expert 21: 259
Expert 22: 1140
Expert 23: 560
Expert 24: 670
Expert 25: 737
Expert 26: 286
Expert 27: 380
Expert 28: 199
Expert 29: 465
Expert 30: 317
Expert 31: 582
Expert 32: 320
Expert 33: 960
Expert 34: 130
Expert 35: 523
Expert 36: 290
Expert 37: 311
Expert 38: 681
Expert 39: 250
Expert 40: 351
Expert 41: 245
Expert 42: 709
Expert 43: 253
Expert 44: 612
Expert 45: 458
Expert 46: 162
Expert 47: 80
Expert 48: 443
Expert 49: 233
Expert 50: 446
Expert 51: 336
Expert 52: 404
Expert 53: 298
Expert 54: 354
Expert 55: 442
Expert 56: 686
Expert 57: 204
Expert 58: 175
Expert 59: 361
Expert 60: 353
Expert 61: 315
Expert 62: 283
Expert 63: 227
Expert 64: 558
Expert 65: 198
Expert 66: 295
Expert 67: 147
Expert 68: 257
Expert 69: 453
Expert 70: 92
Expert 71: 467
Expert 72: 312
Expert 73: 341
Expert 74: 174
Expert 75: 963
Expert 76: 163
Expert 77: 369
Expert 78: 189
Expert 79: 177
Expert 80: 184
Expert 81: 98
Expert 82: 537
Expert 83: 496
Expert 84: 375
Expert 85: 299
Expert 86: 708
Expert 87: 248
Expert 88: 294
Expert 89: 277
Expert 90: 307
Expert 91: 433
Expert 92: 210
Expert 93: 479
Expert 94: 408
Expert 95: 221
Expert 96: 508
Expert 97: 710
Expert 98: 331
Expert 99: 306
Expert 100: 411
Expert 101: 869
Expert 102: 599
Expert 103: 156
Expert 104: 1027
Expert 105: 225
Expert 106: 334
Expert 107: 412
Expert 108: 354
Expert 109: 186
Expert 110: 423
Expert 111: 148
Expert 112: 304
Expert 113: 269
Expert 114: 243
Expert 115: 408
Expert 116: 457
Expert 117: 105
Expert 118: 384
Expert 119: 167
Expert 120: 639
Expert 121: 381
Expert 122: 188
Expert 123: 192
Expert 124: 209
Expert 125: 699
Expert 126: 117
Expert 127: 223
Interpretation of the Layer Statistics:
Metric | Value | Interpretation |
---|---|---|
Forward Pass Time | Avg 45.36 ms (gating+stats 0.45 ms) | Only ≈1% of the total MoE block computation is for router & statistics collection. |
Total Tokens Processed | 49,920 tokens | Distributed among 128 experts in this layer. |
Mean / Median Tokens per Expert | 390 / 349 tokens | Ideally, each expert would handle ≈390 tokens for perfect balance. |
Max / Min Tokens per Expert | 1,140 / 80 tokens | The most utilized expert (No. 22) processed 14.3 times more tokens than the least utilized expert (No. 47). |
Std. Dev. / Coeff. of Variation | 215 / 0.55 | The standard deviation is 55% of the mean, indicating significant load imbalance among experts. |
Gini Coefficient | 0.30 | Measures token distribution inequality (0=perfect equality, 1=max inequality). 0.30 suggests moderate imbalance. |
Top 5 / 10 Experts' Share | 9.9% / 17.7% of total tokens | The top 5 experts handled 9.9% of tokens, and the top 10 handled 17.7%. |