Skip to content

[Question] Canonical SAEs missing matching explanations (Gemma-2-2B) #549

@SleepyLan

Description

@SleepyLan

I am using SAELens with Gemma-2-2B canonical SAEs (e.g., release = "gemma-scope-2b-pt-res-canonical", sae_id = "layer_10/width_16k/average_l0_77").
This SAE has only 2304 features, but the explanations file I found on Neuronpedia corresponds to the full-width 16K SAE (≈16384 rows).
As a result, feature IDs from the canonical SAE do not match the explanations (many indices are out of bounds).

Steps to reproduce

  1. Load a canonical SAE in SAELens:
    from sae_lens import SAE
    sae, cfg, sparsity = SAE.from_pretrained(
        release="gemma-scope-2b-pt-res-canonical",
        sae_id="layer_10/width_16k/average_l0_77",
        device="cuda"
    )
    print("SAE feature dim:", sae.W_dec.shape[1])  # shows 2304
  2. Download explanations from Neuronpedia at
    gemma-2-2b/10-gemmascope-res-16k/explanations
    → file has ~16384 rows (full-width features).
  3. When trying to align these explanations with canonical SAE features, I get mismatched IDs and index-out-of-bounds errors.

Expected behavior

  • Is there an explanations file that matches the canonical SAE (2304 rows, aligned with canonical feature IDs)?
  • If not, could canonical SAE explanations be exported and published?
  • Alternatively, clear documentation on which SAEs have canonical explanations vs. only full-width ones would help.

Environment

  • SAE release: gemma-scope-2b-pt-res-canonical
  • SAE id: layer_10/width_16k/average_l0_77
  • SAELens version / commit hash (please fill in)
  • Explanations source: Neuronpedia (gemma-2-2b/10-gemmascope-res-16k)

Could you please clarify:

  • Is there a way to obtain canonical SAE explanations (aligned with 2304 features)?
  • Or instructions on how to map/crop full-width explanations down to the canonical subset?

Thanks a lot — SAELens + Neuronpedia has been super valuable for the interpretability community! 🙏

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions