CUNY-CL
diff --git a/‎README.md‎
Lines changed: 20 additions & 22 deletions b/‎README.md‎
Lines changed: 20 additions & 22 deletions
diff --git a/‎configs/ewt_bert.yaml‎
Lines changed: 0 additions & 6 deletions b/‎configs/ewt_bert.yaml‎
Lines changed: 0 additions & 6 deletions
diff --git a/‎configs/ewt_distilbert.yaml‎
Lines changed: 1 addition & 6 deletions b/‎configs/ewt_distilbert.yaml‎
Lines changed: 1 addition & 6 deletions
diff --git a/‎configs/ewt_roberta.yaml‎
Lines changed: 0 additions & 6 deletions b/‎configs/ewt_roberta.yaml‎
Lines changed: 0 additions & 6 deletions
diff --git a/‎configs/syntagrus_mbert.yaml‎
Lines changed: 0 additions & 5 deletions b/‎configs/syntagrus_mbert.yaml‎
Lines changed: 0 additions & 5 deletions
diff --git a/‎configs/syntagrus_rubert.yaml‎
Lines changed: 0 additions & 5 deletions b/‎configs/syntagrus_rubert.yaml‎
Lines changed: 0 additions & 5 deletions
diff --git a/‎configs/syntagrus_xlm-roberta.yaml‎
Lines changed: 0 additions & 5 deletions b/‎configs/syntagrus_xlm-roberta.yaml‎
Lines changed: 0 additions & 5 deletions
diff --git a/‎examples/wandb_sweeps/configs/ewt_grid.yaml‎
Lines changed: 15 additions & 5 deletions b/‎examples/wandb_sweeps/configs/ewt_grid.yaml‎
Lines changed: 15 additions & 5 deletions
diff --git a/‎examples/wandb_sweeps/configs/gdt_grid.yaml‎
Lines changed: 15 additions & 3 deletions b/‎examples/wandb_sweeps/configs/gdt_grid.yaml‎
Lines changed: 15 additions & 3 deletions
diff --git a/‎examples/wandb_sweeps/configs/syntagrus_grid.yaml‎
Lines changed: 15 additions & 3 deletions b/‎examples/wandb_sweeps/configs/syntagrus_grid.yaml‎
Lines changed: 15 additions & 3 deletions
@@ -60,31 +60,31 @@ Dependencies project](https://universaldependencies.org/).
 
 UDTube can perform up to four morphological tasks simultaneously:
 
--   Lemmatization is performed using the `LEMMA` field and [edit
-    scripts](https://aclanthology.org/P14-2111/).
-
+-   Lemmatization is performed using the `LEMMA` field and edit scripts.
 -   [Universal part-of-speech
     tagging](https://universaldependencies.org/u/pos/index.html) is performed
-    using the `UPOS` field: enable with `data: use_upos: true`.
-
+    using the `UPOS` field.
 -   Language-specific part-of-speech tagging is performed using the `XPOS`
-    field: enable with `data: use_xpos: true`.
-
--   Morphological feature tagging is performed using the `FEATS` field: enable
-    with `data: use_feats: true`.
+    field.
+-   Morphological feature tagging is performed using the `FEATS` field.
+-   Dependency parsing is performed using the `HEAD` and `DEPREL` fields, a deep
+    biaffine parser, and minimum spanning tree decoding.
 
 The following caveats apply:
 
+-   By default, lemmatization uses reverse-edit scripts. This is appropriate for
+    predominantly suffixal languages, which are thought to represent the
+    majority of the world's languages. If working with a predominantly prefixal
+    language, disable this with `data: reverse_edits: false`.
 -   Note that many newer Universal Dependencies datasets do not have
-    language-specific part-of-speech-tags.
+    language-specific part-of-speech-tags so this task should be disabled
+    (`data: use_xpos: false`).
 -   The `FEATS` field is treated as a single unit and is not segmented in any
     way.
 -   One can convert from [Universal Dependencies morphological
     features](https://universaldependencies.org/u/feat/index.html) to [UniMorph
     features](https://unimorph.github.io/schema/) using
     [`scripts/convert_to_um.py`](scripts/convert_to_um.py).
--   UDTube does not perform dependency parsing at present, so the `HEAD`,
-    `DEPREL`, and `DEPS` fields are ignored and should be specified as `_`.
 
 ## Usage
 
@@ -132,8 +132,9 @@ supported as they lack an `AutoTokenizer`.
 
 #### Classifier
 
-The classifier layer contains up to four sequential linear heads for the four
-tasks described above. By default all four are enabled.
+The classifier layer contains up to four sequential linear heads for the tagging
+tasks, and a biaffine parser head for the parsing task. By default all heads are
+enabled.
 
 #### Optimization
 
@@ -189,7 +190,7 @@ information](https://github.com/CUNY-CL/yoyodyne/blob/master/README.md#logging).
 
 #### Other options
 
-By default, UDTube attempts to model all four tasks; one can disable the
+By default, UDTube attempts to model all five tasks; one can disable the
 language-specific tagging task using `model: use_xpos: false`, and so on.
 
 Dropout probability is specified using `model: dropout: ...`.
@@ -198,25 +199,19 @@ The encoder has multiple layers. The input to the classifier consists of just
 the last few layers mean-pooled together. The number of layers used for
 mean-pooling is specified using `model: pooling_layers: ...`.
 
-By default, lemmatization uses reverse-edit scripts. This is appropriate for
-predominantly suffixal languages, which are thought to represent the majority of
-the world's languages. If working with a predominantly prefixal language,
-disable this with `model: reverse_edits: false`.
-
 The following YAML snippet shows the default architectural arguments.
 
     ...
     model:
       dropout: 0.5
       encoder: google-bert/bert-base-multilingual-cased
       pooling_layers: 1
-      reverse_edits: true
       use_upos: true
       use_xpos: true
       use_lemma: true
       use_feats: true
+      use_parse: true
       ...
-      
 
 Batch size is specified using `data: batch_size: ...` and defaults to 32.
 
@@ -322,3 +317,6 @@ following document, which describes the model:
 Yakubov, D. 2024. [How do we learn what we cannot
 say?](https://academicworks.cuny.edu/gc_etds/5622/) Master's thesis, CUNY
 Graduate Center.
+
+(See also [`udtube.bib`](udtube.bib) for more work used during the development
+of this library.)
@@ -22,12 +22,6 @@ trainer:
 model:
   dropout: 0.4
   encoder: google-bert/bert-base-cased
-  pooling_layers: 4
-  reverse_edits: true
-  use_upos: true
-  use_xpos: true
-  use_lemma: true
-  use_feats: true
   encoder_optimizer:
     class_path: torch.optim.Adam
     init_args:
 
@@ -22,12 +22,6 @@ trainer:
 model:
   dropout: 0.4
   encoder: distilbert/distilbert-base-cased
-  pooling_layers: 4
-  reverse_edits: true
-  use_upos: true
-  use_xpos: true
-  use_lemma: true
-  use_feats: true
   encoder_optimizer:
     class_path: torch.optim.Adam
     init_args:
@@ -52,6 +46,7 @@ data:
   test: /Users/Shinji/UD_English-EWT/en_ewt-ud-test.conllu
   predict: /Users/Shinji/UD_English-EWT/en_ewt-ud-test.conllu
   batch_size: 32
+  reverse_edits: true
 checkpoint:
   filename: "model-{epoch:03d}-{val_loss:.4f}"
   monitor: val_loss
 
@@ -22,12 +22,6 @@ trainer:
 model:
   dropout: 0.4
   encoder: FacebookAI/roberta-base
-  pooling_layers: 4
-  reverse_edits: true
-  use_upos: true
-  use_xpos: true
-  use_lemma: true
-  use_feats: true
   encoder_optimizer:
     class_path: torch.optim.Adam
     init_args:
 
@@ -22,12 +22,7 @@ trainer:
 model:
   dropout: 0.4
   encoder: google-bert/bert-base-multilingual-cased
-  pooling_layers: 4
-  reverse_edits: true
-  use_upos: true
   use_xpos: false
-  use_lemma: true
-  use_feats: true
   encoder_optimizer:
     class_path: torch.optim.Adam
     init_args:
 
@@ -22,12 +22,7 @@ trainer:
 model:
   dropout: 0.4
   encoder: DeepPavlov/rubert
-  pooling_layers: 4
-  reverse_edits: true
-  use_upos: true
   use_xpos: false
-  use_lemma: true
-  use_feats: true
   encoder_optimizer:
     class_path: torch.optim.Adam
     init_args:
 
@@ -22,12 +22,7 @@ trainer:
 model:
   dropout: 0.4
   encoder: FacebookAI/xlm-roberta-base
-  pooling_layers: 4
-  reverse_edits: true
-  use_upos: true
   use_xpos: false
-  use_lemma: true
-  use_feats: true
   encoder_optimizer:
     class_path: torch.optim.Adam
     init_args:
 
@@ -1,15 +1,14 @@
-method: random
+method: bayes
 metric:
   name: val_loss
   goal: minimize
 parameters:
-  model.use_xpos:
-    value: true
   model.dropout:
     distribution: uniform
     min: 0
     max: 0.5
   model.encoder:
+    distribution: categorical
     values:
       - FacebookAI/roberta-base
       - distilbert/distilbert-base-cased
@@ -18,7 +17,7 @@ parameters:
     distribution: q_uniform
     q: 1
     min: 1
-    max: 8
+    max: 4
   model.encoder_optimizer.class_path:
     value: torch.optim.Adam
   model.encoder_optimizer.init_args.lr:
@@ -31,7 +30,17 @@ parameters:
     distribution: q_uniform
     q: 1
     min: 1
-    max: 20
+    max: 40
+  model.arc_mlp_size:
+    distribution: q_uniform
+    q: 64
+    min: 64
+    max: 512
+  model.deprel_mlp_size:
+    distribution: q_uniform
+    q: 64
+    min: 64
+    max: 256
   model.classifier_optimizer.class_path:
     value: torch.optim.Adam
   model.classifier_optimizer.init_args.lr:
@@ -49,6 +58,7 @@ parameters:
   model.classifier_scheduler.init_args.patience:
     value: 5
   data.batch_size:
+    distribution: categorical
     values:
       - 8
       - 16
@@ -1,4 +1,4 @@
-method: random
+method: bayes
 metric:
   name: val_loss
   goal: minimize
@@ -10,14 +10,15 @@ parameters:
     min: 0
     max: 0.5
   model.encoder:
+    distribution: categorical
     values:
       - google-bert/bert-base-multilingual-cased
       - FacebookAI/xlm-roberta-base
   model.pooling_layers:
     distribution: q_uniform
     q: 1
     min: 1
-    max: 8
+    max: 4
   model.encoder_optimizer.class_path:
     value: torch.optim.Adam
   model.encoder_optimizer.init_args.lr:
@@ -30,7 +31,17 @@ parameters:
     distribution: q_uniform
     q: 1
     min: 1
-    max: 20
+    max: 40
+  model.arc_mlp_size:
+    distribution: q_uniform
+    q: 64
+    min: 64
+    max: 512
+  model.deprel_mlp_size:
+    distribution: q_uniform
+    q: 64
+    min: 64
+    max: 256
   model.classifier_optimizer.class_path:
     value: torch.optim.Adam
   model.classifier_optimizer.init_args.lr:
@@ -48,6 +59,7 @@ parameters:
   model.classifier_scheduler.init_args.patience:
     value: 5
   data.batch_size:
+    distribution: categorical
     values:
       - 8
       - 16
@@ -1,4 +1,4 @@
-method: random
+method: bayes
 metric:
   name: val_loss
   goal: minimize
@@ -10,6 +10,7 @@ parameters:
     min: 0
     max: 0.5
   model.encoder:
+    distribution: categorical
     values:
       - google-bert/bert-base-multilingual-cased
       - FacebookAI/xlm-roberta-base
@@ -18,7 +19,7 @@ parameters:
     distribution: q_uniform
     q: 1
     min: 1
-    max: 8
+    max: 4
   model.encoder_optimizer.class_path:
     value: torch.optim.Adam
   model.encoder_optimizer.init_args.lr:
@@ -31,7 +32,17 @@ parameters:
     distribution: q_uniform
     q: 1
     min: 1
-    max: 20
+    max: 40
+  model.arc_mlp_size:
+    distribution: q_uniform
+    q: 64
+    min: 64
+    max: 512
+  model.deprel_mlp_size:
+    distribution: q_uniform
+    q: 64
+    min: 64
+    max: 256
   model.classifier_optimizer.class_path:
     value: torch.optim.Adam
   model.classifier_optimizer.init_args.lr:
@@ -49,6 +60,7 @@ parameters:
   model.classifier_scheduler.init_args.patience:
     value: 5
   data.batch_size:
+    distribution: categorical
     values:
       - 8
       - 16