TheRealSal
diff --git a/‎docs/classes/adapter_config.rst‎
Lines changed: 6 additions & 6 deletions b/‎docs/classes/adapter_config.rst‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎docs/contributing.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/contributing.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/method_combinations.md‎
Lines changed: 7 additions & 7 deletions b/‎docs/method_combinations.md‎
Lines changed: 7 additions & 7 deletions
diff --git a/‎docs/methods.md‎
Lines changed: 12 additions & 12 deletions b/‎docs/methods.md‎
Lines changed: 12 additions & 12 deletions
diff --git a/‎docs/overview.md‎
Lines changed: 10 additions & 10 deletions b/‎docs/overview.md‎
Lines changed: 10 additions & 10 deletions
diff --git a/‎docs/prediction_heads.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/prediction_heads.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/training.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/training.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎examples/pytorch/adapterfusion/run_fusion_glue.py‎
Lines changed: 16 additions & 16 deletions b/‎examples/pytorch/adapterfusion/run_fusion_glue.py‎
Lines changed: 16 additions & 16 deletions
diff --git a/‎examples/pytorch/test_adapter_examples.py‎
Lines changed: 9 additions & 9 deletions b/‎examples/pytorch/test_adapter_examples.py‎
Lines changed: 9 additions & 9 deletions
diff --git a/‎examples/pytorch/text-classification/README.md‎
Lines changed: 1 addition & 1 deletion b/‎examples/pytorch/text-classification/README.md‎
Lines changed: 1 addition & 1 deletion
@@ -9,23 +9,23 @@ Single (bottleneck) adapters
 .. autoclass:: adapters.AdapterConfigBase
     :members:
 
-.. autoclass:: adapters.AdapterConfig
+.. autoclass:: adapters.BnConfig
     :members:
     :inherited-members: Mapping
 
-.. autoclass:: adapters.PfeifferConfig
+.. autoclass:: adapters.SeqBnConfig
     :members:
 
-.. autoclass:: adapters.PfeifferInvConfig
+.. autoclass:: adapters.SeqBnInvConfig
     :members:
 
-.. autoclass:: adapters.HoulsbyConfig
+.. autoclass:: adapters.DoubleSeqBnConfig
     :members:
 
-.. autoclass:: adapters.HoulsbyInvConfig
+.. autoclass:: adapters.DoubleSeqBnInvConfig
     :members:
 
-.. autoclass:: adapters.ParallelConfig
+.. autoclass:: adapters.ParBnConfig
     :members:
 
 .. autoclass:: adapters.CompacterConfig
 
@@ -0,0 +1 @@
+../contributing.md
@@ -8,11 +8,11 @@ To make this process easier, adapters provides the possibility to group multiple
 For example, this could be used to define different reduction factors for the adapter modules placed after the multi-head attention and the feed-forward blocks:
 
 ```python
-from adapters import AdapterConfig, ConfigUnion
+from adapters import BnConfig, ConfigUnion
 
 config = ConfigUnion(
-    AdapterConfig(mh_adapter=True, output_adapter=False, reduction_factor=16, non_linearity="relu"),
-    AdapterConfig(mh_adapter=False, output_adapter=True, reduction_factor=2, non_linearity="relu"),
+    BnConfig(mh_adapter=True, output_adapter=False, reduction_factor=16, non_linearity="relu"),
+    BnConfig(mh_adapter=False, output_adapter=True, reduction_factor=2, non_linearity="relu"),
 )
 model.add_adapter("union_adapter", config=config)
 ```
@@ -35,11 +35,11 @@ model.add_adapter("mam_adapter", config=config)
 and is identical to using the following `ConfigUnion`:
 
 ```python
-from adapters import ConfigUnion, ParallelConfig, PrefixTuningConfig
+from adapters import ConfigUnion, ParBnConfig, PrefixTuningConfig
 
 config = ConfigUnion(
     PrefixTuningConfig(bottleneck_size=800),
-    ParallelConfig(),
+    ParBnConfig(),
 )
 model.add_adapter("mam_adapter", config=config)
 ```
@@ -89,12 +89,12 @@ model.add_adapter("unipelt", config=config)
 which is identical to the following `ConfigUnion`:
 
 ```python
-from adapters import ConfigUnion, LoRAConfig, PrefixTuningConfig, PfeifferConfig
+from adapters import ConfigUnion, LoRAConfig, PrefixTuningConfig, SeqBnConfig
 
 config = ConfigUnion(
     LoRAConfig(r=8, use_gating=True),
     PrefixTuningConfig(prefix_length=10, use_gating=True),
-    PfeifferConfig(reduction_factor=16, use_gating=True),
+    SeqBnConfig(reduction_factor=16, use_gating=True),
 )
 model.add_adapter("unipelt", config=config)
 ```
 
@@ -6,7 +6,7 @@ Additionally, options to combine multiple adapter methods in a single setup are
 
 ## Bottleneck Adapters
 
-_Configuration class_: [`AdapterConfig`](adapters.AdapterConfig)
+_Configuration class_: [`BnConfig`](adapters.BnConfig)
 
 Bottleneck adapters introduce bottleneck feed-forward layers in each layer of a Transformer model.
 Generally, these adapter layers consist of a down-projection matrix $W_{down}$ that projects the layer hidden states into a lower dimension $d_{bottleneck}$, a non-linearity $f$, an up-projection $W_{up}$ that projects back into the original hidden layer dimension and a residual connection $r$:
@@ -25,7 +25,7 @@ $$
 \text{reduction_factor} = \frac{d_{hidden}}{d_{bottleneck}}
 $$
 
-A visualization of further configuration options related to the adapter structure is given in the figure below. For more details, refer to the documentation of [`AdapterConfig`](adapters.AdapterConfig).
+A visualization of further configuration options related to the adapter structure is given in the figure below. For more details, refer to the documentation of [`BnConfig`](adapters.BnConfig).
 
 
 ```{eval-rst}
@@ -39,15 +39,15 @@ A visualization of further configuration options related to the adapter structur
 
 `adapters` comes with pre-defined configurations for some bottleneck adapter architectures proposed in literature:
 
-- [`HoulsbyConfig`](adapters.HoulsbyConfig) as proposed by [Houlsby et al. (2019)](https://arxiv.org/pdf/1902.00751.pdf) places adapter layers after both the multi-head attention and feed-forward block in each Transformer layer.
-- [`PfeifferConfig`](adapters.PfeifferConfig) as proposed by [Pfeiffer et al. (2020)](https://arxiv.org/pdf/2005.00052.pdf) places an adapter layer only after the feed-forward block in each Transformer layer.
-- [`ParallelConfig`](adapters.ParallelConfig) as proposed by [He et al. (2021)](https://arxiv.org/pdf/2110.04366.pdf) places adapter layers in parallel to the original Transformer layers.
+- [`DoubleSeqBnConfig`](adapters.DoubleSeqBnConfig) as proposed by [Houlsby et al. (2019)](https://arxiv.org/pdf/1902.00751.pdf) places adapter layers after both the multi-head attention and feed-forward block in each Transformer layer.
+- [`SeqBnConfig`](adapters.SeqBnConfig) as proposed by [Pfeiffer et al. (2020)](https://arxiv.org/pdf/2005.00052.pdf) places an adapter layer only after the feed-forward block in each Transformer layer.
+- [`ParBnConfig`](adapters.ParBnConfig) as proposed by [He et al. (2021)](https://arxiv.org/pdf/2110.04366.pdf) places adapter layers in parallel to the original Transformer layers.
 
 _Example_:
 ```python
-from adapters import AdapterConfig
+from adapters import BnConfig
 
-config = AdapterConfig(mh_adapter=True, output_adapter=True, reduction_factor=16, non_linearity="relu")
+config = BnConfig(mh_adapter=True, output_adapter=True, reduction_factor=16, non_linearity="relu")
 model.add_adapter("bottleneck_adapter", config=config)
 ```
 
@@ -60,21 +60,21 @@ _Papers:_
 
 ## Language Adapters - Invertible Adapters
 
-_Configuration class_: [`PfeifferInvConfig`](adapters.PfeifferInvConfig), [`HoulsbyInvConfig`](adapters.HoulsbyInvConfig)
+_Configuration class_: [`SeqBnInvConfig`](adapters.SeqBnInvConfig), [`DoubleSeqBnInvConfig`](adapters.DoubleSeqBnInvConfig)
 
 The MAD-X setup ([Pfeiffer et al., 2020](https://arxiv.org/pdf/2005.00052.pdf)) proposes language adapters to learn language-specific transformations.
 After being trained on a language modeling task, a language adapter can be stacked before a task adapter for training on a downstream task.
 To perform zero-shot cross-lingual transfer, one language adapter can simply be replaced by another.
 
 In terms of architecture, language adapters are largely similar to regular bottleneck adapters, except for an additional _invertible adapter_ layer after the LM embedding layer.
 Embedding outputs are passed through this invertible adapter in the forward direction before entering the first Transformer layer and in the inverse direction after leaving the last Transformer layer.
-Invertible adapter architectures are further detailed in [Pfeiffer et al. (2020)](https://arxiv.org/pdf/2005.00052.pdf) and can be configured via the `inv_adapter` attribute of the `AdapterConfig` class.
+Invertible adapter architectures are further detailed in [Pfeiffer et al. (2020)](https://arxiv.org/pdf/2005.00052.pdf) and can be configured via the `inv_adapter` attribute of the `BnConfig` class.
 
 _Example_:
 ```python
-from adapters import PfeifferInvConfig
+from adapters import SeqBnInvConfig
 
-config = PfeifferInvConfig()
+config = SeqBnInvConfig()
 model.add_adapter("lang_adapter", config=config)
 ```
 
@@ -150,7 +150,7 @@ for a PHM layer by specifying `use_phm=True` in the config.
 The PHM layer has the following additional properties: `phm_dim`, `shared_phm_rule`, `factorized_phm_rule`, `learn_phm`, 
 `factorized_phm_W`, `shared_W_phm`, `phm_c_init`, `phm_init_range`, `hypercomplex_nonlinearity`
 
-For more information check out the [`AdapterConfig`](adapters.AdapterConfig) class.
+For more information check out the [`BnConfig`](adapters.BnConfig) class.
 
 To add a Compacter to your model you can use the predefined configs:
 ```python
 
@@ -44,12 +44,12 @@ Identifiers and configuration classes are explained in more detail in the [next
 
 | Identifier | Configuration class | More information
 | --- | --- | --- |
-| `pfeiffer` | `PfeifferConfig()` | [Bottleneck Adapters](methods.html#bottleneck-adapters) |
-| `houlsby` | `HoulsbyConfig()` | [Bottleneck Adapters](methods.html#bottleneck-adapters) |
-| `parallel` | `ParallelConfig()` | [Bottleneck Adapters](methods.html#bottleneck-adapters) |
-| `scaled_parallel` | `ParallelConfig(scaling="learned")` | [Bottleneck Adapters](methods.html#bottleneck-adapters) |
-| `pfeiffer+inv` | `PfeifferInvConfig()` | [Invertible Adapters](methods.html#language-adapters---invertible-adapters) |
-| `houlsby+inv` | `HoulsbyInvConfig()` | [Invertible Adapters](methods.html#language-adapters---invertible-adapters) |
+| `seq_bn` | `SeqBnConfig()` | [Bottleneck Adapters](methods.html#bottleneck-adapters) |
+| `double_seq_bn` | `DoubleSeqBnConfig()` | [Bottleneck Adapters](methods.html#bottleneck-adapters) |
+| `par_bn` | `ParBnConfig()` | [Bottleneck Adapters](methods.html#bottleneck-adapters) |
+| `scaled_par_bn` | `ParBnConfig(scaling="learned")` | [Bottleneck Adapters](methods.html#bottleneck-adapters) |
+| `seq_bn_inv` | `SeqBnInvConfig()` | [Invertible Adapters](methods.html#language-adapters---invertible-adapters) |
+| `double_seq_bn_inv` | `DoubleSeqBnInvConfig()` | [Invertible Adapters](methods.html#language-adapters---invertible-adapters) |
 | `compacter` | `CompacterConfig()` | [Compacter](methods.html#compacter) |
 | `compacter++` | `CompacterPlusPlusConfig()` | [Compacter](methods.html#compacter) |
 | `prefix_tuning` | `PrefixTuningConfig()` | [Prefix Tuning](methods.html#prefix-tuning) |
@@ -79,16 +79,16 @@ Configuration strings are a concise way of defining a specific adapter method co
 They are especially useful when adapter configurations are passed from external sources such as the command-line, when using configuration classes is not an option.
 
 In general, a configuration string for a single method takes the form `<identifier>[<key>=<value>, ...]`.
-Here, `<identifier>` refers to one of the identifiers listed in [the table above](#table-of-adapter-methods), e.g. `parallel`.
-In square brackets after the identifier, you can set specific configuration attributes from the respective configuration class, e.g. `parallel[reduction_factor=2]`.
+Here, `<identifier>` refers to one of the identifiers listed in [the table above](#table-of-adapter-methods), e.g. `par_bn`.
+In square brackets after the identifier, you can set specific configuration attributes from the respective configuration class, e.g. `par_bn[reduction_factor=2]`.
 If all attributes remain at their default values, this can be omitted.
 
 Finally, it is also possible to specify a [method combination](method_combinations.md) as a configuration string by joining multiple configuration strings with `|`.
-E.g., `prefix_tuning[bottleneck_size=800]|parallel` is identical to the following configuration class instance:
+E.g., `prefix_tuning[bottleneck_size=800]|par_bn` is identical to the following configuration class instance:
 
 ```python
 ConfigUnion(
     PrefixTuningConfig(bottleneck_size=800),
-    ParallelConfig(),
+    ParBnConfig(),
 )
 ```
@@ -29,7 +29,7 @@ To learn more about the different head types and the configuration options, plea
 
 Now, of course, we would like to train our classification head together with an adapter, so let's add one:
 ```python
-model.add_adapter("mrpc", config="pfeiffer")
+model.add_adapter("mrpc", config="seq_bn")
 model.set_active_adapters("mrpc")
 ```
 
 
@@ -59,7 +59,7 @@ Compared to fine-tuning the full model, there is only this one significant adapt
 # task adapter - only add if not existing
 if task_name not in model.config.adapters:
     # resolve the adapter config
-    adapter_config = AdapterConfig.load(adapter_args.adapter_config)
+    adapter_config = AdapterConfigBase.load(adapter_args.adapter_config)
     # add a new adapter
     model.add_adapter(task_name, config=adapter_config)
 # Enable adapter training
@@ -114,7 +114,7 @@ python run_glue.py \
   --output_dir /tmp/$TASK_NAME \
   --overwrite_output_dir \
   --train_adapter \
-  --adapter_config pfeiffer
+  --adapter_config seq_bn
 ```
 
 The important flag here is `--train_adapter` which switches from fine-tuning the full model to training an adapter module for the given GLUE task.
@@ -150,7 +150,7 @@ python run_mlm.py \
     --num_train_epochs 10.0 \
     --output_dir /tmp/test-mlm \
     --train_adapter \
-    --adapter_config "pfeiffer+inv"
+    --adapter_config "seq_bn_inv"
 ```
 
 ## Train AdapterFusion
 
@@ -139,24 +139,24 @@ def main():
 
     # ~~~~~ Here comes the interesting part of setting up AdapterFusion training ~~~~~
 
-    from adapters.configuration import PfeifferConfig
+    from adapters.configuration import SeqBnConfig
 
     # First, load the pre-trained adapters we want to fuse from Hub
-    model.load_adapter("sentiment/sst-2@ukp", config=PfeifferConfig(), with_head=False)
-    model.load_adapter("nli/multinli@ukp", config=PfeifferConfig(), with_head=False)
-    model.load_adapter("nli/rte@ukp", config=PfeifferConfig(), with_head=False)
-    model.load_adapter("sts/mrpc@ukp", config=PfeifferConfig(), with_head=False)
-    model.load_adapter("sts/qqp@ukp", config=PfeifferConfig(), with_head=False)
-    model.load_adapter("comsense/cosmosqa@ukp", config=PfeifferConfig(), with_head=False)
-    model.load_adapter("comsense/csqa@ukp", config=PfeifferConfig(), with_head=False)
-    model.load_adapter("comsense/hellaswag@ukp", config=PfeifferConfig(), with_head=False)
-    model.load_adapter("comsense/siqa@ukp", config=PfeifferConfig(), with_head=False)
-    model.load_adapter("comsense/winogrande@ukp", config=PfeifferConfig(), with_head=False)
-    model.load_adapter("nli/cb@ukp", config=PfeifferConfig(), with_head=False)
-    model.load_adapter("nli/sick@ukp", config=PfeifferConfig(), with_head=False)
-    model.load_adapter("nli/scitail@ukp", config=PfeifferConfig(), with_head=False)
-    model.load_adapter("qa/boolq@ukp", config=PfeifferConfig(), with_head=False)
-    model.load_adapter("sentiment/imdb@ukp", config=PfeifferConfig(), with_head=False)
+    model.load_adapter("sentiment/sst-2@ukp", config=SeqBnConfig(), with_head=False)
+    model.load_adapter("nli/multinli@ukp", config=SeqBnConfig(), with_head=False)
+    model.load_adapter("nli/rte@ukp", config=SeqBnConfig(), with_head=False)
+    model.load_adapter("sts/mrpc@ukp", config=SeqBnConfig(), with_head=False)
+    model.load_adapter("sts/qqp@ukp", config=SeqBnConfig(), with_head=False)
+    model.load_adapter("comsense/cosmosqa@ukp", config=SeqBnConfig(), with_head=False)
+    model.load_adapter("comsense/csqa@ukp", config=SeqBnConfig(), with_head=False)
+    model.load_adapter("comsense/hellaswag@ukp", config=SeqBnConfig(), with_head=False)
+    model.load_adapter("comsense/siqa@ukp", config=SeqBnConfig(), with_head=False)
+    model.load_adapter("comsense/winogrande@ukp", config=SeqBnConfig(), with_head=False)
+    model.load_adapter("nli/cb@ukp", config=SeqBnConfig(), with_head=False)
+    model.load_adapter("nli/sick@ukp", config=SeqBnConfig(), with_head=False)
+    model.load_adapter("nli/scitail@ukp", config=SeqBnConfig(), with_head=False)
+    model.load_adapter("qa/boolq@ukp", config=SeqBnConfig(), with_head=False)
+    model.load_adapter("sentiment/imdb@ukp", config=SeqBnConfig(), with_head=False)
 
     adapter_setup = [
         [
 
@@ -79,7 +79,7 @@ def test_run_glue_adapters(self):
             --seed=42
             --max_seq_length=128
             --train_adapter
-            --adapter_config=houlsby
+            --adapter_config=double_seq_bn
             """.split()
         with patch.object(sys, "argv", testargs):
             run_glue.main()
@@ -107,7 +107,7 @@ def test_run_fusion_glue(self):
             --seed=42
             --max_seq_length=128
             --train_adapter
-            --adapter_config=houlsby
+            --adapter_config=double_seq_bn
             --load_adapter=qqp@ukp
             """.split()
         with patch.object(sys, "argv", testargs):
@@ -137,7 +137,7 @@ def test_run_squad_adapters(self):
             --per_device_train_batch_size=2
             --per_device_eval_batch_size=1
             --train_adapter
-            --adapter_config=houlsby
+            --adapter_config=double_seq_bn
             --adapter_reduction_factor=8
         """.split()
 
@@ -167,7 +167,7 @@ def test_run_swag_adapter(self):
             --per_device_train_batch_size=2
             --per_device_eval_batch_size=1
             --train_adapter
-            --adapter_config=houlsby
+            --adapter_config=double_seq_bn
             --adapter_reduction_factor=8
         """.split()
 
@@ -196,7 +196,7 @@ def test_run_clm_adapter(self):
             --output_dir {tmp_dir}
             --overwrite_output_dir
             --train_adapter
-            --adapter_config=houlsby
+            --adapter_config=double_seq_bn
             --adapter_reduction_factor=8
             """.split()
 
@@ -229,7 +229,7 @@ def test_run_mlm_adapter(self):
             --prediction_loss_only
             --num_train_epochs=1
             --train_adapter
-            --adapter_config=houlsby
+            --adapter_config=double_seq_bn
             --adapter_reduction_factor=8
         """.split()
 
@@ -287,7 +287,7 @@ def test_run_summarization_adapter(self):
                 --per_device_eval_batch_size=1
                 --predict_with_generate
                 --train_adapter
-                --adapter_config=houlsby
+                --adapter_config=double_seq_bn
                 --adapter_reduction_factor=8
             """.split()
 
@@ -325,7 +325,7 @@ def test_run_translation_adapter(self):
                 --source_lang en_XX
                 --target_lang ro_RO
                 --train_adapter
-                --adapter_config=houlsby
+                --adapter_config=double_seq_bn
                 --adapter_reduction_factor=8
             """.split()
 
@@ -357,7 +357,7 @@ def test_run_ner_adapter(self):
             --per_device_eval_batch_size=2
             --num_train_epochs={epochs}
             --train_adapter
-            --adapter_config=houlsby
+            --adapter_config=double_seq_bn
             --adapter_reduction_factor=16
         """.split()
 
 
@@ -37,7 +37,7 @@ python run_glue.py \
   --output_dir /tmp/$TASK_NAME \
   --overwrite_output_dir \
   --train_adapter \
-  --adapter_config pfeiffer
+  --adapter_config seq_bn
 ```
 
 ---