Skip to content

Commit 261c458

Browse files
committed
Rename bottleneck adapter configs (speechbrain#27)
1 parent ab3325b commit 261c458

30 files changed

Lines changed: 194 additions & 184 deletions

docs/classes/adapter_config.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,23 +9,23 @@ Single (bottleneck) adapters
99
.. autoclass:: adapters.AdapterConfigBase
1010
:members:
1111

12-
.. autoclass:: adapters.AdapterConfig
12+
.. autoclass:: adapters.BnConfig
1313
:members:
1414
:inherited-members: Mapping
1515

16-
.. autoclass:: adapters.PfeifferConfig
16+
.. autoclass:: adapters.SeqBnConfig
1717
:members:
1818

19-
.. autoclass:: adapters.PfeifferInvConfig
19+
.. autoclass:: adapters.SeqBnInvConfig
2020
:members:
2121

22-
.. autoclass:: adapters.HoulsbyConfig
22+
.. autoclass:: adapters.DoubleSeqBnConfig
2323
:members:
2424

25-
.. autoclass:: adapters.HoulsbyInvConfig
25+
.. autoclass:: adapters.DoubleSeqBnInvConfig
2626
:members:
2727

28-
.. autoclass:: adapters.ParallelConfig
28+
.. autoclass:: adapters.ParBnConfig
2929
:members:
3030

3131
.. autoclass:: adapters.CompacterConfig

docs/contributing.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../contributing.md

docs/method_combinations.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,11 @@ To make this process easier, adapters provides the possibility to group multiple
88
For example, this could be used to define different reduction factors for the adapter modules placed after the multi-head attention and the feed-forward blocks:
99

1010
```python
11-
from adapters import AdapterConfig, ConfigUnion
11+
from adapters import BnConfig, ConfigUnion
1212

1313
config = ConfigUnion(
14-
AdapterConfig(mh_adapter=True, output_adapter=False, reduction_factor=16, non_linearity="relu"),
15-
AdapterConfig(mh_adapter=False, output_adapter=True, reduction_factor=2, non_linearity="relu"),
14+
BnConfig(mh_adapter=True, output_adapter=False, reduction_factor=16, non_linearity="relu"),
15+
BnConfig(mh_adapter=False, output_adapter=True, reduction_factor=2, non_linearity="relu"),
1616
)
1717
model.add_adapter("union_adapter", config=config)
1818
```
@@ -35,11 +35,11 @@ model.add_adapter("mam_adapter", config=config)
3535
and is identical to using the following `ConfigUnion`:
3636

3737
```python
38-
from adapters import ConfigUnion, ParallelConfig, PrefixTuningConfig
38+
from adapters import ConfigUnion, ParBnConfig, PrefixTuningConfig
3939

4040
config = ConfigUnion(
4141
PrefixTuningConfig(bottleneck_size=800),
42-
ParallelConfig(),
42+
ParBnConfig(),
4343
)
4444
model.add_adapter("mam_adapter", config=config)
4545
```
@@ -89,12 +89,12 @@ model.add_adapter("unipelt", config=config)
8989
which is identical to the following `ConfigUnion`:
9090

9191
```python
92-
from adapters import ConfigUnion, LoRAConfig, PrefixTuningConfig, PfeifferConfig
92+
from adapters import ConfigUnion, LoRAConfig, PrefixTuningConfig, SeqBnConfig
9393

9494
config = ConfigUnion(
9595
LoRAConfig(r=8, use_gating=True),
9696
PrefixTuningConfig(prefix_length=10, use_gating=True),
97-
PfeifferConfig(reduction_factor=16, use_gating=True),
97+
SeqBnConfig(reduction_factor=16, use_gating=True),
9898
)
9999
model.add_adapter("unipelt", config=config)
100100
```

docs/methods.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Additionally, options to combine multiple adapter methods in a single setup are
66

77
## Bottleneck Adapters
88

9-
_Configuration class_: [`AdapterConfig`](adapters.AdapterConfig)
9+
_Configuration class_: [`BnConfig`](adapters.BnConfig)
1010

1111
Bottleneck adapters introduce bottleneck feed-forward layers in each layer of a Transformer model.
1212
Generally, these adapter layers consist of a down-projection matrix $W_{down}$ that projects the layer hidden states into a lower dimension $d_{bottleneck}$, a non-linearity $f$, an up-projection $W_{up}$ that projects back into the original hidden layer dimension and a residual connection $r$:
@@ -25,7 +25,7 @@ $$
2525
\text{reduction_factor} = \frac{d_{hidden}}{d_{bottleneck}}
2626
$$
2727

28-
A visualization of further configuration options related to the adapter structure is given in the figure below. For more details, refer to the documentation of [`AdapterConfig`](adapters.AdapterConfig).
28+
A visualization of further configuration options related to the adapter structure is given in the figure below. For more details, refer to the documentation of [`BnConfig`](adapters.BnConfig).
2929

3030

3131
```{eval-rst}
@@ -39,15 +39,15 @@ A visualization of further configuration options related to the adapter structur
3939

4040
`adapters` comes with pre-defined configurations for some bottleneck adapter architectures proposed in literature:
4141

42-
- [`HoulsbyConfig`](adapters.HoulsbyConfig) as proposed by [Houlsby et al. (2019)](https://arxiv.org/pdf/1902.00751.pdf) places adapter layers after both the multi-head attention and feed-forward block in each Transformer layer.
43-
- [`PfeifferConfig`](adapters.PfeifferConfig) as proposed by [Pfeiffer et al. (2020)](https://arxiv.org/pdf/2005.00052.pdf) places an adapter layer only after the feed-forward block in each Transformer layer.
44-
- [`ParallelConfig`](adapters.ParallelConfig) as proposed by [He et al. (2021)](https://arxiv.org/pdf/2110.04366.pdf) places adapter layers in parallel to the original Transformer layers.
42+
- [`DoubleSeqBnConfig`](adapters.DoubleSeqBnConfig) as proposed by [Houlsby et al. (2019)](https://arxiv.org/pdf/1902.00751.pdf) places adapter layers after both the multi-head attention and feed-forward block in each Transformer layer.
43+
- [`SeqBnConfig`](adapters.SeqBnConfig) as proposed by [Pfeiffer et al. (2020)](https://arxiv.org/pdf/2005.00052.pdf) places an adapter layer only after the feed-forward block in each Transformer layer.
44+
- [`ParBnConfig`](adapters.ParBnConfig) as proposed by [He et al. (2021)](https://arxiv.org/pdf/2110.04366.pdf) places adapter layers in parallel to the original Transformer layers.
4545

4646
_Example_:
4747
```python
48-
from adapters import AdapterConfig
48+
from adapters import BnConfig
4949

50-
config = AdapterConfig(mh_adapter=True, output_adapter=True, reduction_factor=16, non_linearity="relu")
50+
config = BnConfig(mh_adapter=True, output_adapter=True, reduction_factor=16, non_linearity="relu")
5151
model.add_adapter("bottleneck_adapter", config=config)
5252
```
5353

@@ -60,21 +60,21 @@ _Papers:_
6060

6161
## Language Adapters - Invertible Adapters
6262

63-
_Configuration class_: [`PfeifferInvConfig`](adapters.PfeifferInvConfig), [`HoulsbyInvConfig`](adapters.HoulsbyInvConfig)
63+
_Configuration class_: [`SeqBnInvConfig`](adapters.SeqBnInvConfig), [`DoubleSeqBnInvConfig`](adapters.DoubleSeqBnInvConfig)
6464

6565
The MAD-X setup ([Pfeiffer et al., 2020](https://arxiv.org/pdf/2005.00052.pdf)) proposes language adapters to learn language-specific transformations.
6666
After being trained on a language modeling task, a language adapter can be stacked before a task adapter for training on a downstream task.
6767
To perform zero-shot cross-lingual transfer, one language adapter can simply be replaced by another.
6868

6969
In terms of architecture, language adapters are largely similar to regular bottleneck adapters, except for an additional _invertible adapter_ layer after the LM embedding layer.
7070
Embedding outputs are passed through this invertible adapter in the forward direction before entering the first Transformer layer and in the inverse direction after leaving the last Transformer layer.
71-
Invertible adapter architectures are further detailed in [Pfeiffer et al. (2020)](https://arxiv.org/pdf/2005.00052.pdf) and can be configured via the `inv_adapter` attribute of the `AdapterConfig` class.
71+
Invertible adapter architectures are further detailed in [Pfeiffer et al. (2020)](https://arxiv.org/pdf/2005.00052.pdf) and can be configured via the `inv_adapter` attribute of the `BnConfig` class.
7272

7373
_Example_:
7474
```python
75-
from adapters import PfeifferInvConfig
75+
from adapters import SeqBnInvConfig
7676

77-
config = PfeifferInvConfig()
77+
config = SeqBnInvConfig()
7878
model.add_adapter("lang_adapter", config=config)
7979
```
8080

@@ -150,7 +150,7 @@ for a PHM layer by specifying `use_phm=True` in the config.
150150
The PHM layer has the following additional properties: `phm_dim`, `shared_phm_rule`, `factorized_phm_rule`, `learn_phm`,
151151
`factorized_phm_W`, `shared_W_phm`, `phm_c_init`, `phm_init_range`, `hypercomplex_nonlinearity`
152152

153-
For more information check out the [`AdapterConfig`](adapters.AdapterConfig) class.
153+
For more information check out the [`BnConfig`](adapters.BnConfig) class.
154154

155155
To add a Compacter to your model you can use the predefined configs:
156156
```python

docs/overview.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -44,12 +44,12 @@ Identifiers and configuration classes are explained in more detail in the [next
4444

4545
| Identifier | Configuration class | More information
4646
| --- | --- | --- |
47-
| `pfeiffer` | `PfeifferConfig()` | [Bottleneck Adapters](methods.html#bottleneck-adapters) |
48-
| `houlsby` | `HoulsbyConfig()` | [Bottleneck Adapters](methods.html#bottleneck-adapters) |
49-
| `parallel` | `ParallelConfig()` | [Bottleneck Adapters](methods.html#bottleneck-adapters) |
50-
| `scaled_parallel` | `ParallelConfig(scaling="learned")` | [Bottleneck Adapters](methods.html#bottleneck-adapters) |
51-
| `pfeiffer+inv` | `PfeifferInvConfig()` | [Invertible Adapters](methods.html#language-adapters---invertible-adapters) |
52-
| `houlsby+inv` | `HoulsbyInvConfig()` | [Invertible Adapters](methods.html#language-adapters---invertible-adapters) |
47+
| `seq_bn` | `SeqBnConfig()` | [Bottleneck Adapters](methods.html#bottleneck-adapters) |
48+
| `double_seq_bn` | `DoubleSeqBnConfig()` | [Bottleneck Adapters](methods.html#bottleneck-adapters) |
49+
| `par_bn` | `ParBnConfig()` | [Bottleneck Adapters](methods.html#bottleneck-adapters) |
50+
| `scaled_par_bn` | `ParBnConfig(scaling="learned")` | [Bottleneck Adapters](methods.html#bottleneck-adapters) |
51+
| `seq_bn_inv` | `SeqBnInvConfig()` | [Invertible Adapters](methods.html#language-adapters---invertible-adapters) |
52+
| `double_seq_bn_inv` | `DoubleSeqBnInvConfig()` | [Invertible Adapters](methods.html#language-adapters---invertible-adapters) |
5353
| `compacter` | `CompacterConfig()` | [Compacter](methods.html#compacter) |
5454
| `compacter++` | `CompacterPlusPlusConfig()` | [Compacter](methods.html#compacter) |
5555
| `prefix_tuning` | `PrefixTuningConfig()` | [Prefix Tuning](methods.html#prefix-tuning) |
@@ -79,16 +79,16 @@ Configuration strings are a concise way of defining a specific adapter method co
7979
They are especially useful when adapter configurations are passed from external sources such as the command-line, when using configuration classes is not an option.
8080

8181
In general, a configuration string for a single method takes the form `<identifier>[<key>=<value>, ...]`.
82-
Here, `<identifier>` refers to one of the identifiers listed in [the table above](#table-of-adapter-methods), e.g. `parallel`.
83-
In square brackets after the identifier, you can set specific configuration attributes from the respective configuration class, e.g. `parallel[reduction_factor=2]`.
82+
Here, `<identifier>` refers to one of the identifiers listed in [the table above](#table-of-adapter-methods), e.g. `par_bn`.
83+
In square brackets after the identifier, you can set specific configuration attributes from the respective configuration class, e.g. `par_bn[reduction_factor=2]`.
8484
If all attributes remain at their default values, this can be omitted.
8585

8686
Finally, it is also possible to specify a [method combination](method_combinations.md) as a configuration string by joining multiple configuration strings with `|`.
87-
E.g., `prefix_tuning[bottleneck_size=800]|parallel` is identical to the following configuration class instance:
87+
E.g., `prefix_tuning[bottleneck_size=800]|par_bn` is identical to the following configuration class instance:
8888

8989
```python
9090
ConfigUnion(
9191
PrefixTuningConfig(bottleneck_size=800),
92-
ParallelConfig(),
92+
ParBnConfig(),
9393
)
9494
```

docs/prediction_heads.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ To learn more about the different head types and the configuration options, plea
2929

3030
Now, of course, we would like to train our classification head together with an adapter, so let's add one:
3131
```python
32-
model.add_adapter("mrpc", config="pfeiffer")
32+
model.add_adapter("mrpc", config="seq_bn")
3333
model.set_active_adapters("mrpc")
3434
```
3535

docs/training.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ Compared to fine-tuning the full model, there is only this one significant adapt
5959
# task adapter - only add if not existing
6060
if task_name not in model.config.adapters:
6161
# resolve the adapter config
62-
adapter_config = AdapterConfig.load(adapter_args.adapter_config)
62+
adapter_config = AdapterConfigBase.load(adapter_args.adapter_config)
6363
# add a new adapter
6464
model.add_adapter(task_name, config=adapter_config)
6565
# Enable adapter training
@@ -114,7 +114,7 @@ python run_glue.py \
114114
--output_dir /tmp/$TASK_NAME \
115115
--overwrite_output_dir \
116116
--train_adapter \
117-
--adapter_config pfeiffer
117+
--adapter_config seq_bn
118118
```
119119

120120
The important flag here is `--train_adapter` which switches from fine-tuning the full model to training an adapter module for the given GLUE task.
@@ -150,7 +150,7 @@ python run_mlm.py \
150150
--num_train_epochs 10.0 \
151151
--output_dir /tmp/test-mlm \
152152
--train_adapter \
153-
--adapter_config "pfeiffer+inv"
153+
--adapter_config "seq_bn_inv"
154154
```
155155

156156
## Train AdapterFusion

examples/pytorch/adapterfusion/run_fusion_glue.py

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -139,24 +139,24 @@ def main():
139139

140140
# ~~~~~ Here comes the interesting part of setting up AdapterFusion training ~~~~~
141141

142-
from adapters.configuration import PfeifferConfig
142+
from adapters.configuration import SeqBnConfig
143143

144144
# First, load the pre-trained adapters we want to fuse from Hub
145-
model.load_adapter("sentiment/sst-2@ukp", config=PfeifferConfig(), with_head=False)
146-
model.load_adapter("nli/multinli@ukp", config=PfeifferConfig(), with_head=False)
147-
model.load_adapter("nli/rte@ukp", config=PfeifferConfig(), with_head=False)
148-
model.load_adapter("sts/mrpc@ukp", config=PfeifferConfig(), with_head=False)
149-
model.load_adapter("sts/qqp@ukp", config=PfeifferConfig(), with_head=False)
150-
model.load_adapter("comsense/cosmosqa@ukp", config=PfeifferConfig(), with_head=False)
151-
model.load_adapter("comsense/csqa@ukp", config=PfeifferConfig(), with_head=False)
152-
model.load_adapter("comsense/hellaswag@ukp", config=PfeifferConfig(), with_head=False)
153-
model.load_adapter("comsense/siqa@ukp", config=PfeifferConfig(), with_head=False)
154-
model.load_adapter("comsense/winogrande@ukp", config=PfeifferConfig(), with_head=False)
155-
model.load_adapter("nli/cb@ukp", config=PfeifferConfig(), with_head=False)
156-
model.load_adapter("nli/sick@ukp", config=PfeifferConfig(), with_head=False)
157-
model.load_adapter("nli/scitail@ukp", config=PfeifferConfig(), with_head=False)
158-
model.load_adapter("qa/boolq@ukp", config=PfeifferConfig(), with_head=False)
159-
model.load_adapter("sentiment/imdb@ukp", config=PfeifferConfig(), with_head=False)
145+
model.load_adapter("sentiment/sst-2@ukp", config=SeqBnConfig(), with_head=False)
146+
model.load_adapter("nli/multinli@ukp", config=SeqBnConfig(), with_head=False)
147+
model.load_adapter("nli/rte@ukp", config=SeqBnConfig(), with_head=False)
148+
model.load_adapter("sts/mrpc@ukp", config=SeqBnConfig(), with_head=False)
149+
model.load_adapter("sts/qqp@ukp", config=SeqBnConfig(), with_head=False)
150+
model.load_adapter("comsense/cosmosqa@ukp", config=SeqBnConfig(), with_head=False)
151+
model.load_adapter("comsense/csqa@ukp", config=SeqBnConfig(), with_head=False)
152+
model.load_adapter("comsense/hellaswag@ukp", config=SeqBnConfig(), with_head=False)
153+
model.load_adapter("comsense/siqa@ukp", config=SeqBnConfig(), with_head=False)
154+
model.load_adapter("comsense/winogrande@ukp", config=SeqBnConfig(), with_head=False)
155+
model.load_adapter("nli/cb@ukp", config=SeqBnConfig(), with_head=False)
156+
model.load_adapter("nli/sick@ukp", config=SeqBnConfig(), with_head=False)
157+
model.load_adapter("nli/scitail@ukp", config=SeqBnConfig(), with_head=False)
158+
model.load_adapter("qa/boolq@ukp", config=SeqBnConfig(), with_head=False)
159+
model.load_adapter("sentiment/imdb@ukp", config=SeqBnConfig(), with_head=False)
160160

161161
adapter_setup = [
162162
[

examples/pytorch/test_adapter_examples.py

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ def test_run_glue_adapters(self):
7979
--seed=42
8080
--max_seq_length=128
8181
--train_adapter
82-
--adapter_config=houlsby
82+
--adapter_config=double_seq_bn
8383
""".split()
8484
with patch.object(sys, "argv", testargs):
8585
run_glue.main()
@@ -107,7 +107,7 @@ def test_run_fusion_glue(self):
107107
--seed=42
108108
--max_seq_length=128
109109
--train_adapter
110-
--adapter_config=houlsby
110+
--adapter_config=double_seq_bn
111111
--load_adapter=qqp@ukp
112112
""".split()
113113
with patch.object(sys, "argv", testargs):
@@ -137,7 +137,7 @@ def test_run_squad_adapters(self):
137137
--per_device_train_batch_size=2
138138
--per_device_eval_batch_size=1
139139
--train_adapter
140-
--adapter_config=houlsby
140+
--adapter_config=double_seq_bn
141141
--adapter_reduction_factor=8
142142
""".split()
143143

@@ -167,7 +167,7 @@ def test_run_swag_adapter(self):
167167
--per_device_train_batch_size=2
168168
--per_device_eval_batch_size=1
169169
--train_adapter
170-
--adapter_config=houlsby
170+
--adapter_config=double_seq_bn
171171
--adapter_reduction_factor=8
172172
""".split()
173173

@@ -196,7 +196,7 @@ def test_run_clm_adapter(self):
196196
--output_dir {tmp_dir}
197197
--overwrite_output_dir
198198
--train_adapter
199-
--adapter_config=houlsby
199+
--adapter_config=double_seq_bn
200200
--adapter_reduction_factor=8
201201
""".split()
202202

@@ -229,7 +229,7 @@ def test_run_mlm_adapter(self):
229229
--prediction_loss_only
230230
--num_train_epochs=1
231231
--train_adapter
232-
--adapter_config=houlsby
232+
--adapter_config=double_seq_bn
233233
--adapter_reduction_factor=8
234234
""".split()
235235

@@ -287,7 +287,7 @@ def test_run_summarization_adapter(self):
287287
--per_device_eval_batch_size=1
288288
--predict_with_generate
289289
--train_adapter
290-
--adapter_config=houlsby
290+
--adapter_config=double_seq_bn
291291
--adapter_reduction_factor=8
292292
""".split()
293293

@@ -325,7 +325,7 @@ def test_run_translation_adapter(self):
325325
--source_lang en_XX
326326
--target_lang ro_RO
327327
--train_adapter
328-
--adapter_config=houlsby
328+
--adapter_config=double_seq_bn
329329
--adapter_reduction_factor=8
330330
""".split()
331331

@@ -357,7 +357,7 @@ def test_run_ner_adapter(self):
357357
--per_device_eval_batch_size=2
358358
--num_train_epochs={epochs}
359359
--train_adapter
360-
--adapter_config=houlsby
360+
--adapter_config=double_seq_bn
361361
--adapter_reduction_factor=16
362362
""".split()
363363

examples/pytorch/text-classification/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ python run_glue.py \
3737
--output_dir /tmp/$TASK_NAME \
3838
--overwrite_output_dir \
3939
--train_adapter \
40-
--adapter_config pfeiffer
40+
--adapter_config seq_bn
4141
```
4242

4343
---

0 commit comments

Comments
 (0)