You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/method_combinations.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,11 +8,11 @@ To make this process easier, adapters provides the possibility to group multiple
8
8
For example, this could be used to define different reduction factors for the adapter modules placed after the multi-head attention and the feed-forward blocks:
Bottleneck adapters introduce bottleneck feed-forward layers in each layer of a Transformer model.
12
12
Generally, these adapter layers consist of a down-projection matrix $W_{down}$ that projects the layer hidden states into a lower dimension $d_{bottleneck}$, a non-linearity $f$, an up-projection $W_{up}$ that projects back into the original hidden layer dimension and a residual connection $r$:
A visualization of further configuration options related to the adapter structure is given in the figure below. For more details, refer to the documentation of [`AdapterConfig`](adapters.AdapterConfig).
28
+
A visualization of further configuration options related to the adapter structure is given in the figure below. For more details, refer to the documentation of [`BnConfig`](adapters.BnConfig).
29
29
30
30
31
31
```{eval-rst}
@@ -39,15 +39,15 @@ A visualization of further configuration options related to the adapter structur
39
39
40
40
`adapters` comes with pre-defined configurations for some bottleneck adapter architectures proposed in literature:
41
41
42
-
-[`HoulsbyConfig`](adapters.HoulsbyConfig) as proposed by [Houlsby et al. (2019)](https://arxiv.org/pdf/1902.00751.pdf) places adapter layers after both the multi-head attention and feed-forward block in each Transformer layer.
43
-
-[`PfeifferConfig`](adapters.PfeifferConfig) as proposed by [Pfeiffer et al. (2020)](https://arxiv.org/pdf/2005.00052.pdf) places an adapter layer only after the feed-forward block in each Transformer layer.
44
-
-[`ParallelConfig`](adapters.ParallelConfig) as proposed by [He et al. (2021)](https://arxiv.org/pdf/2110.04366.pdf) places adapter layers in parallel to the original Transformer layers.
42
+
-[`DoubleSeqBnConfig`](adapters.DoubleSeqBnConfig) as proposed by [Houlsby et al. (2019)](https://arxiv.org/pdf/1902.00751.pdf) places adapter layers after both the multi-head attention and feed-forward block in each Transformer layer.
43
+
-[`SeqBnConfig`](adapters.SeqBnConfig) as proposed by [Pfeiffer et al. (2020)](https://arxiv.org/pdf/2005.00052.pdf) places an adapter layer only after the feed-forward block in each Transformer layer.
44
+
-[`ParBnConfig`](adapters.ParBnConfig) as proposed by [He et al. (2021)](https://arxiv.org/pdf/2110.04366.pdf) places adapter layers in parallel to the original Transformer layers.
The MAD-X setup ([Pfeiffer et al., 2020](https://arxiv.org/pdf/2005.00052.pdf)) proposes language adapters to learn language-specific transformations.
66
66
After being trained on a language modeling task, a language adapter can be stacked before a task adapter for training on a downstream task.
67
67
To perform zero-shot cross-lingual transfer, one language adapter can simply be replaced by another.
68
68
69
69
In terms of architecture, language adapters are largely similar to regular bottleneck adapters, except for an additional _invertible adapter_ layer after the LM embedding layer.
70
70
Embedding outputs are passed through this invertible adapter in the forward direction before entering the first Transformer layer and in the inverse direction after leaving the last Transformer layer.
71
-
Invertible adapter architectures are further detailed in [Pfeiffer et al. (2020)](https://arxiv.org/pdf/2005.00052.pdf) and can be configured via the `inv_adapter` attribute of the `AdapterConfig` class.
71
+
Invertible adapter architectures are further detailed in [Pfeiffer et al. (2020)](https://arxiv.org/pdf/2005.00052.pdf) and can be configured via the `inv_adapter` attribute of the `BnConfig` class.
72
72
73
73
_Example_:
74
74
```python
75
-
from adapters importPfeifferInvConfig
75
+
from adapters importSeqBnInvConfig
76
76
77
-
config =PfeifferInvConfig()
77
+
config =SeqBnInvConfig()
78
78
model.add_adapter("lang_adapter", config=config)
79
79
```
80
80
@@ -150,7 +150,7 @@ for a PHM layer by specifying `use_phm=True` in the config.
150
150
The PHM layer has the following additional properties: `phm_dim`, `shared_phm_rule`, `factorized_phm_rule`, `learn_phm`,
@@ -79,16 +79,16 @@ Configuration strings are a concise way of defining a specific adapter method co
79
79
They are especially useful when adapter configurations are passed from external sources such as the command-line, when using configuration classes is not an option.
80
80
81
81
In general, a configuration string for a single method takes the form `<identifier>[<key>=<value>, ...]`.
82
-
Here, `<identifier>` refers to one of the identifiers listed in [the table above](#table-of-adapter-methods), e.g. `parallel`.
83
-
In square brackets after the identifier, you can set specific configuration attributes from the respective configuration class, e.g. `parallel[reduction_factor=2]`.
82
+
Here, `<identifier>` refers to one of the identifiers listed in [the table above](#table-of-adapter-methods), e.g. `par_bn`.
83
+
In square brackets after the identifier, you can set specific configuration attributes from the respective configuration class, e.g. `par_bn[reduction_factor=2]`.
84
84
If all attributes remain at their default values, this can be omitted.
85
85
86
86
Finally, it is also possible to specify a [method combination](method_combinations.md) as a configuration string by joining multiple configuration strings with `|`.
87
-
E.g., `prefix_tuning[bottleneck_size=800]|parallel` is identical to the following configuration class instance:
87
+
E.g., `prefix_tuning[bottleneck_size=800]|par_bn` is identical to the following configuration class instance:
0 commit comments