-
Notifications
You must be signed in to change notification settings - Fork 210
Open
Description
I am using SAELens with Gemma-2-2B canonical SAEs (e.g., release = "gemma-scope-2b-pt-res-canonical", sae_id = "layer_10/width_16k/average_l0_77").
This SAE has only 2304 features, but the explanations file I found on Neuronpedia corresponds to the full-width 16K SAE (≈16384 rows).
As a result, feature IDs from the canonical SAE do not match the explanations (many indices are out of bounds).
Steps to reproduce
- Load a canonical SAE in SAELens:
from sae_lens import SAE sae, cfg, sparsity = SAE.from_pretrained( release="gemma-scope-2b-pt-res-canonical", sae_id="layer_10/width_16k/average_l0_77", device="cuda" ) print("SAE feature dim:", sae.W_dec.shape[1]) # shows 2304
- Download explanations from Neuronpedia at
gemma-2-2b/10-gemmascope-res-16k/explanations
→ file has ~16384 rows (full-width features). - When trying to align these explanations with canonical SAE features, I get mismatched IDs and index-out-of-bounds errors.
Expected behavior
- Is there an explanations file that matches the canonical SAE (2304 rows, aligned with canonical feature IDs)?
- If not, could canonical SAE explanations be exported and published?
- Alternatively, clear documentation on which SAEs have canonical explanations vs. only full-width ones would help.
Environment
- SAE release: gemma-scope-2b-pt-res-canonical
- SAE id: layer_10/width_16k/average_l0_77
- SAELens version / commit hash (please fill in)
- Explanations source: Neuronpedia (gemma-2-2b/10-gemmascope-res-16k)
Could you please clarify:
- Is there a way to obtain canonical SAE explanations (aligned with 2304 features)?
- Or instructions on how to map/crop full-width explanations down to the canonical subset?
Thanks a lot — SAELens + Neuronpedia has been super valuable for the interpretability community! 🙏
Metadata
Metadata
Assignees
Labels
No labels