You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+28-9Lines changed: 28 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -75,22 +75,28 @@ Another key object is the `ActivationBuffer`, defined in `buffer.py`. Following
75
75
An `ActivationBuffer` is initialized from an `nnsight``LanguageModel` object, a submodule (e.g. an MLP), and a generator which yields strings (the text data). It processes a large number of strings, up to some capacity, and saves the submodule's activations. You sample batches from it, and when it is half-depleted, it refreshes itself with new text data.
76
76
77
77
Here's an example for training a dictionary; in it we load a language model as an `nnsight``LanguageModel` (this will work for any Huggingface model), specify a submodule, create an `ActivationBuffer`, and then train an autoencoder with `trainSAE`.
78
+
79
+
NOTE: This is a simple reference example. For an example with standard hyperparameter settings, HuggingFace dataset usage, etc, we recommend referring to this [demonstration](https://github.com/adamkarvonen/dictionary_learning_demo).
78
80
```python
79
81
from nnsight import LanguageModel
80
-
from dictionary_learning import ActivationBuffer, AutoEncoder
81
-
from dictionary_learning.trainers importStandardTrainer
82
+
from dictionary_learning import ActivationBuffer
83
+
from dictionary_learning.trainers.top_kimportTopKTrainer, AutoEncoderTopK
82
84
from dictionary_learning.training import trainSAE
83
85
84
86
device ="cuda:0"
85
-
model_name ="EleutherAI/pythia-70m-deduped"# can be any Huggingface model
87
+
model_name ="EleutherAI/pythia-70m-deduped"# can be any Huggingface model
d_submodule=activation_dim, # output dimension of the model component
108
-
n_ctxs=3e4, # you can set this higher or lower dependong on your available memory
115
+
d_submodule=activation_dim, # output dimension of the model component
116
+
n_ctxs=int(
117
+
1e2
118
+
), # you can set this higher or lower depending on your available memory
109
119
device=device,
120
+
refresh_batch_size=llm_batch_size,
121
+
out_batch_size=sae_batch_size,
110
122
) # buffer will yield batches of tensors of dimension = submodule's output dimension
111
123
112
124
trainer_cfg = {
113
-
"trainer": StandardTrainer,
114
-
"dict_class": AutoEncoder,
125
+
"trainer": TopKTrainer,
126
+
"dict_class": AutoEncoderTopK,
115
127
"activation_dim": activation_dim,
116
128
"dict_size": dictionary_size,
117
129
"lr": 1e-3,
118
130
"device": device,
131
+
"steps": training_steps,
132
+
"layer": layer,
133
+
"lm_name": model_name,
134
+
"warmup_steps": 1,
135
+
"k": 100,
119
136
}
120
137
121
138
# train the sparse autoencoder (SAE)
122
139
ae = trainSAE(
123
140
data=buffer, # you could also use another (i.e. pytorch dataloader) here instead of buffer
124
141
trainer_configs=[trainer_cfg],
142
+
steps=training_steps, # The number of training steps. Total trained tokens = steps * batch_size
125
143
)
144
+
126
145
```
127
146
Some technical notes our training infrastructure and supported features:
128
147
* Training uses the `ConstrainedAdam` optimizer defined in `training.py`. This is a variant of Adam which supports constraining the `AutoEncoder`'s decoder weights to be norm 1.
0 commit comments