Collect and visualize activations from Mixture-of-Experts (MoE) models.
This library extracts top activating examples per MoE expert along with useful metadata (router weights, norms, logits), and saves everything into a lightweight format with a ready-to-use browser visualization.
uv add git+ssh://git@github.com/jerryy33/moe-vis.gitor
pip install git+ssh://git@github.com/jerryy33/moe-vis.gitRun the example script
python example.py --model_name allenai/OLMoE-1B-7B-0125 --layer 0 --num_samples 50 --num_logits 3This will collect data for all experts in layer 0 of the OLMoE-1B-7B-0125 model, keeping the top 50 activating examples per expert.
Alternatively use the API like this:
collector = ExpertCollector(
num_experts,
num_samples=50, # top examples per expert
num_logits=3, # positive + negative (num_logits*2 total)
unembedding_matrix=lm_head.weight.T,
)
collector.update(...) # call inside your forward loop
collector.save(model_id, out_dir)Check out example.py for an end-to-end example.
Creates a folder like this:
out_dir/
└── vis/
├── acts/
│ ├── E0.safetensors
│ ├── E1.safetensors
│ └── ...
├── tokenizer.json
├── template.html
├── reader.js
└── out.csscd out_dir/vispython -m http.serverThe interface shows highlighted examples per expert. For each activation you can inspect the router weight, L2 norm, and which logits were promoted or demoted (via LogitLens).
The data files can become large depending on how many examples you store per expert. A good approximation is num_samples × num_tokens_per_example × per-token cost.
The per-token cost is dominated by num_logits. Typical values:
num_logits = 3→ ~30–40 bytes per tokennum_logits = 20→ ~150–170 bytes per token
Example:
num_logits = 3, num_samples = 100, seq_len = 32 → ~100 KB per expert
For precise calculations use the estimator.py script.