Skip to content

Conversation

@canrager
Copy link
Contributor

No description provided.

@hijohnnylin
Copy link
Collaborator

hijohnnylin commented Oct 24, 2025

Loading the temporalSAE safetensor using SAE.from_pretrained or SAE.from_pretrained_with_cfg_and_sparsity fails currently due to:

  • W_enc doesn't exist in the state dictionary (checked Gemma 2). I printed out the state_dict_raw keys in the pretrained_sae_loaders.py - temporal_sae_huggingface_loader, and I don't see "E":
    • dict_keys(['D', 'attn_layers.0.c_proj.bias', 'attn_layers.0.c_proj.weight', 'attn_layers.0.k_ctx.bias', 'attn_layers.0.k_ctx.weight', 'attn_layers.0.q_target.bias', 'attn_layers.0.q_target.weight', 'attn_layers.0.v_ctx.bias', 'attn_layers.0.v_ctx.weight', 'b'])
    • @canrager please verify and reupload the safetensor files
  • b_dec shape is slightly off (we are expecting [2304] but b_dec is [1, 2304])
    • workaround: doing squeeze on the b_dec while importing in sae.py in the converter

normalize_activations: Literal[
"none", "expected_average_only_in", "constant_norm_rescale", "layer_norm"
] = "none" # none, expected_average_only_in (Anthropic April Update), constant_norm_rescale (Anthropic Feb Update)
activation_normalization_factor: float = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? I'd rather avoid adding config options to the global SAE config if it's just for temporal SAEs. If constant_norm_rescale isn't used currently we should just delete it from the types IMO. Can you just fold the scaling factor into your temporal SAE weights when you load them so this isn't needed as a separate global SAE config option?

@chanind chanind merged commit 888c586 into decoderesearch:main Oct 27, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants