TransformerBridge Compatibility Mode

TransformerBridge.boot_transformers(...) returns a bridge whose default numerics match HuggingFace — raw weights, no folding, no centering. Calling bridge.enable_compatibility_mode() afterwards puts the bridge into HookedTransformer-equivalent numerics — weights folded, centered, and the legacy hook aliases registered.

Most research code that was written against HookedTransformer.from_pretrained(...) assumes compatibility mode. Most new code that needs HF-faithful logits does not.

Source: transformer_lens/model_bridge/bridge.py:enable_compatibility_mode.

When to enable it

Use case	Compatibility mode?	Why
Logit lens / direct logit attribution	Yes	These analyses reason in the post-fold-LN coordinate system; raw HF weights produce different (wrong) attributions.
Residual-stream norm analysis	Yes	Centered weights give the residual a meaningful zero.
Circuit analysis using HT-style hook names (`blocks.{i}.attn.hook_q`, `hook_resid_pre`, etc.)	Yes	Legacy aliases register only after compat mode.
Logit parity against HuggingFace	No	Folding changes weights; logits will not match HF.
Generation / inference vs HF baseline	No	Same reason.
Verifying a new adapter's forward pass	No (initially)	Use `enable_compatibility_mode(no_processing=True)` to get hook aliases without weight processing — isolates forward-pass bugs from weight-processing bugs.

What each flag does

bridge.enable_compatibility_mode(
    disable_warnings: bool = False,
    no_processing: bool = False,
    fold_ln: bool = True,
    center_writing_weights: bool = True,
    center_unembed: bool = True,
    fold_value_biases: bool = True,
    refactor_factored_attn_matrices: bool = False,
)

Flag	Default	Effect
`no_processing`	`False`	If `True`, overrides all other processing flags to False — registers the legacy hook aliases only, leaves weights raw. The "I want HT hook names but HF numerics" mode.
`fold_ln`	`True`	Folds LayerNorm scale + bias into the subsequent linear weights so the LayerNorm modules become pure normalization. Changes weights; mathematically equivalent.
`center_writing_weights`	`True`	Subtracts the mean from each "writing" weight (`W_out` in attention, MLP-down). Makes residual contributions sum to zero per layer, which makes residual-stream norms interpretable.
`center_unembed`	`True`	Subtracts the mean from the unembedding matrix. Logits become mean-zero — affects logit-lens output but not argmax.
`fold_value_biases`	`True`	Folds attention value biases into the output bias. Same numerics, fewer parameters.
`refactor_factored_attn_matrices`	`False`	Refactors `W_Q @ W_K.T` and `W_V @ W_O` for analysis. Off by default because it's slow and only matters for specific factored-matrix research.
`disable_warnings`	`False`	Suppresses warnings emitted by legacy component aliases when accessed.

After processing, the bridge also:

Re-initializes the hook registry.
Calls _setup_hook_compatibility() on every component (installs HT-style hook conversions like reshaping hook_z from [batch, seq, d_model] to [batch, seq, n_heads, d_head]).
Registers HT-style hook aliases recursively across blocks.

compatibility_mode is then True on the bridge and on every component, so subsequent operations behave as if the bridge were loaded by HookedTransformer.from_pretrained().

Hook semantic parity

After enable_compatibility_mode(), these HT hook names fire on the pre-norm residual (matching HookedTransformer semantics):

blocks.{i}.attn.hook_q_input, hook_k_input, hook_v_input
blocks.{i}.hook_attn_in
blocks.{i}.hook_mlp_in (gated on cfg.use_hook_mlp_in; toggle via bridge.set_use_hook_mlp_in(True))

Carve-outs (issue #1317):

Post-norm architectures (OLMo 2, BERT-style) read the post-attention residual instead, because the norm semantically lives elsewhere in the block.
MLA blocks (DeepSeek V2 / V3 / R1) do not expose the split-qkv aliases — MLA's compressed K/V doesn't have a clean split.

An adapter author for a new post-norm or MLA-style architecture must handle these carve-outs in setup_hook_compatibility. The Gemma1/Gemma2 adapters are exemplars of when not to override setup_hook_compatibility — GemmaTextScaledWordEmbedding already scales internally, so any added hook_conversion would double-scale embed.hook_out.

The four-quadrant test matrix

The integration conftest at tests/integration/model_bridge/conftest.py provides four bridge variants for every test model:

Variant	`compatibility_mode`	`no_processing`	Tests…
`gpt2_bridge`	off	n/a	HF-faithful numerics
`gpt2_bridge_compat`	on	`False`	HT-equivalent numerics
`gpt2_bridge_compat_no_processing`	on	`True`	Hook aliases without weight processing — used to bisect numerical bugs
(HT side) `gpt2_hooked_processed`, `gpt2_hooked_unprocessed`	n/a	n/a	Reference HookedTransformer with/without weight processing

New integration tests should use the variant that matches the property they're testing. Tests of HF parity → gpt2_bridge. Tests of HT-API behaviour → gpt2_bridge_compat. Tests of hook semantics regardless of weights → gpt2_bridge_compat_no_processing.

Cost

enable_compatibility_mode() mutates the bridge's weights in-place. It is:

One-shot: calling it twice re-runs the centering subtractions. Don't.
Not reversible from within the bridge — re-boot for raw weights.
_setup_hook_compatibility is idempotent; only process_weights mutates weights.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TransformerBridge Compatibility Mode

When to enable it

What each flag does

Hook semantic parity

The four-quadrant test matrix

Cost

See also

FilesExpand file tree

compatibility_mode.md

Latest commit

History

compatibility_mode.md

File metadata and controls

TransformerBridge Compatibility Mode

When to enable it

What each flag does

Hook semantic parity

The four-quadrant test matrix

Cost

See also