[Question] Canonical SAEs missing matching explanations (Gemma-2-2B)

I am using SAELens with Gemma-2-2B canonical SAEs (e.g., `release = "gemma-scope-2b-pt-res-canonical"`, `sae_id = "layer_10/width_16k/average_l0_77"`).  
This SAE has only **2304 features**, but the explanations file I found on Neuronpedia corresponds to the full-width 16K SAE (≈16384 rows).  
As a result, feature IDs from the canonical SAE do not match the explanations (many indices are out of bounds).

**Steps to reproduce**  
1. Load a canonical SAE in SAELens:
   ```python
   from sae_lens import SAE
   sae, cfg, sparsity = SAE.from_pretrained(
       release="gemma-scope-2b-pt-res-canonical",
       sae_id="layer_10/width_16k/average_l0_77",
       device="cuda"
   )
   print("SAE feature dim:", sae.W_dec.shape[1])  # shows 2304

2. Download explanations from Neuronpedia at
gemma-2-2b/10-gemmascope-res-16k/explanations
→ file has ~16384 rows (full-width features).
3. When trying to align these explanations with canonical SAE features, I get mismatched IDs and index-out-of-bounds errors.

Expected behavior
- Is there an explanations file that matches the canonical SAE (2304 rows, aligned with canonical feature IDs)?
- If not, could canonical SAE explanations be exported and published?
- Alternatively, clear documentation on which SAEs have canonical explanations vs. only full-width ones would help.

Environment
- SAE release: gemma-scope-2b-pt-res-canonical
- SAE id: layer_10/width_16k/average_l0_77
- SAELens version / commit hash (please fill in)
- Explanations source: Neuronpedia (gemma-2-2b/10-gemmascope-res-16k)

Could you please clarify:
- Is there a way to obtain canonical SAE explanations (aligned with 2304 features)?
- Or instructions on how to map/crop full-width explanations down to the canonical subset?

Thanks a lot — SAELens + Neuronpedia has been super valuable for the interpretability community! 🙏

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Canonical SAEs missing matching explanations (Gemma-2-2B) #549

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Canonical SAEs missing matching explanations (Gemma-2-2B) #549

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions