Commit e86f9e0
authored
Adds parser task using deep biaffine parser (#120)
* Adds metrics for parsing
* Beginning integration
* Adds metrics test.
One major issue is that this requires us to use negative indices for
specials, which breaks assumptions in the indexes. Will have to come
back and fix this.
* Draft of parser and its integration
* More work.
Known issues:
1. I don't think the metrics test is going to work; I will need to shift
all the head indices by special.OFFSET.
2. I am not passing a parser mask. Do I need to? I think maybe yes.
* Applies shift to metrics test to avoid collisions.
* Moves reverse_edits to data, where it belongs.
It has no effect in the model, so let's get rid of it.
* Days' debugging work
* More work; still debugging
* Optimizes mmap instructions (#116)
* Updates Black version
* Adds logging for vocabularies (#117)
* Adds logging for vocabularies
Sample output:
INFO: 22-Feb-26 17:56:27 - UPOS vocabulary (21): '[PAD]', '[UNK]', '_', 'ADJ', 'ADP', 'ADV', 'AUX', 'CCONJ', 'DET', 'INTJ', 'NOUN', 'NUM', 'PART', 'PRON', 'PROPN', 'PUNCT', 'SCONJ', 'SYM', 'VERB', 'X', '_'
INFO: 22-Feb-26 17:56:27 - XPOS vocabulary (53): '[PAD]', '[UNK]', '_', '$', "''", ',', '-LRB-', '-RRB-', '.', ':', 'ADD', 'AFX', 'CC', 'CD', 'DT', 'EX', 'FW', 'GW', 'HYPH', 'IN', 'JJ', 'JJR', 'JJS', 'LS', 'MD', 'NFP', 'NN', 'NNP', 'NNPS', 'NNS', 'PDT', 'POS', 'PRP', 'PRP$', 'RB', 'RBR', 'RBS', 'RP', 'SYM', 'TO', 'UH', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'WDT', 'WP', 'WP$', 'WRB', '_', '``'
INFO: 22-Feb-26 17:56:27 - Lemma vocabulary (533): [omitted]
INFO: 22-Feb-26 17:56:27 - Features vocabulary (235): [omitted]
Closes #115.
* black update
* f-string fix
* driveby: silence more warnings
* Avoids "Crashed" status in sweeps. (#118)
See Yoyodyne [#369](CUNY-CL/yoyodyne#369) for
context.
Closes #79.
* Pooling layer efficiency (#119)
* Fix pooling layer regression in UDTubeEncoder.forward
Special cases pooling_layers=1 to use last_hidden_state directly, avoiding
unnecessary allocation of all hidden states. This seems to save a lot of
GPU memory.
A few drive-bys:
1. suppress progress bar during test data generation
2. add "not human-readable" to "[omitted]" when logging lemmas
3. actually log features; why not?
4. pass information about which heads to build to the data module too,
so it logs properly
5. removes _ from "special", since it doesn't require any special
treatment in actuality; it's just another tag as far as we're
concerned.
6. Standardizes trailing """: it's on its own line if the comment is
more than one line.
* regeneration last-minute fix
* Update special.py
* fix typo
* Optimizes mmap instructions (#116)
* Adds logging for vocabularies (#117)
* Adds logging for vocabularies
Sample output:
INFO: 22-Feb-26 17:56:27 - UPOS vocabulary (21): '[PAD]', '[UNK]', '_', 'ADJ', 'ADP', 'ADV', 'AUX', 'CCONJ', 'DET', 'INTJ', 'NOUN', 'NUM', 'PART', 'PRON', 'PROPN', 'PUNCT', 'SCONJ', 'SYM', 'VERB', 'X', '_'
INFO: 22-Feb-26 17:56:27 - XPOS vocabulary (53): '[PAD]', '[UNK]', '_', '$', "''", ',', '-LRB-', '-RRB-', '.', ':', 'ADD', 'AFX', 'CC', 'CD', 'DT', 'EX', 'FW', 'GW', 'HYPH', 'IN', 'JJ', 'JJR', 'JJS', 'LS', 'MD', 'NFP', 'NN', 'NNP', 'NNPS', 'NNS', 'PDT', 'POS', 'PRP', 'PRP$', 'RB', 'RBR', 'RBS', 'RP', 'SYM', 'TO', 'UH', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'WDT', 'WP', 'WP$', 'WRB', '_', '``'
INFO: 22-Feb-26 17:56:27 - Lemma vocabulary (533): [omitted]
INFO: 22-Feb-26 17:56:27 - Features vocabulary (235): [omitted]
Closes #115.
* black update
* f-string fix
* driveby: silence more warnings
* Avoids "Crashed" status in sweeps. (#118)
See Yoyodyne [#369](CUNY-CL/yoyodyne#369) for
context.
Closes #79.
* Pooling layer efficiency (#119)
* Fix pooling layer regression in UDTubeEncoder.forward
Special cases pooling_layers=1 to use last_hidden_state directly, avoiding
unnecessary allocation of all hidden states. This seems to save a lot of
GPU memory.
A few drive-bys:
1. suppress progress bar during test data generation
2. add "not human-readable" to "[omitted]" when logging lemmas
3. actually log features; why not?
4. pass information about which heads to build to the data module too,
so it logs properly
5. removes _ from "special", since it doesn't require any special
treatment in actuality; it's just another tag as far as we're
concerned.
6. Standardizes trailing """: it's on its own line if the comment is
more than one line.
* regeneration last-minute fix
* Beginning integration
* Adds metrics test.
One major issue is that this requires us to use negative indices for
specials, which breaks assumptions in the indexes. Will have to come
back and fix this.
* Draft of parser and its integration
* More work.
Known issues:
1. I don't think the metrics test is going to work; I will need to shift
all the head indices by special.OFFSET.
2. I am not passing a parser mask. Do I need to? I think maybe yes.
* Moves reverse_edits to data, where it belongs.
It has no effect in the model, so let's get rid of it.
* Days' debugging work
* More work; still debugging
* Optimizes mmap instructions (#116)
* Pooling layer efficiency (#119)
* Fix pooling layer regression in UDTubeEncoder.forward
Special cases pooling_layers=1 to use last_hidden_state directly, avoiding
unnecessary allocation of all hidden states. This seems to save a lot of
GPU memory.
A few drive-bys:
1. suppress progress bar during test data generation
2. add "not human-readable" to "[omitted]" when logging lemmas
3. actually log features; why not?
4. pass information about which heads to build to the data module too,
so it logs properly
5. removes _ from "special", since it doesn't require any special
treatment in actuality; it's just another tag as far as we're
concerned.
6. Standardizes trailing """: it's on its own line if the comment is
more than one line.
* regeneration last-minute fix
* Manual merge
* README and bibliography
* stashing incomplete work
* updates parser
* Parser testing
* Expands grid for biaffine parsing hparams
* Adds the parser itself
* Updates tests
Eliminates a test bug where the file comparisons were against the
hypothesis file!
* reflows README
* updates encoder special-casing logic slightly
* Update mappers.py
* Daniel's suggestion1 parent fed582b commit e86f9e0
43 files changed
Lines changed: 1973 additions & 611 deletions
File tree
- configs
- examples/wandb_sweeps/configs
- scripts
- tests
- testdata
- udtube
- data
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
63 | | - | |
64 | | - | |
65 | | - | |
| 63 | + | |
66 | 64 | | |
67 | 65 | | |
68 | | - | |
69 | | - | |
| 66 | + | |
70 | 67 | | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
75 | 72 | | |
76 | 73 | | |
77 | 74 | | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
78 | 79 | | |
79 | | - | |
| 80 | + | |
| 81 | + | |
80 | 82 | | |
81 | 83 | | |
82 | 84 | | |
83 | 85 | | |
84 | 86 | | |
85 | 87 | | |
86 | | - | |
87 | | - | |
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
| |||
132 | 132 | | |
133 | 133 | | |
134 | 134 | | |
135 | | - | |
136 | | - | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
137 | 138 | | |
138 | 139 | | |
139 | 140 | | |
| |||
189 | 190 | | |
190 | 191 | | |
191 | 192 | | |
192 | | - | |
| 193 | + | |
193 | 194 | | |
194 | 195 | | |
195 | 196 | | |
| |||
198 | 199 | | |
199 | 200 | | |
200 | 201 | | |
201 | | - | |
202 | | - | |
203 | | - | |
204 | | - | |
205 | | - | |
206 | 202 | | |
207 | 203 | | |
208 | 204 | | |
209 | 205 | | |
210 | 206 | | |
211 | 207 | | |
212 | 208 | | |
213 | | - | |
214 | 209 | | |
215 | 210 | | |
216 | 211 | | |
217 | 212 | | |
| 213 | + | |
218 | 214 | | |
219 | | - | |
220 | 215 | | |
221 | 216 | | |
222 | 217 | | |
| |||
322 | 317 | | |
323 | 318 | | |
324 | 319 | | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | 25 | | |
32 | 26 | | |
33 | 27 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | 25 | | |
32 | 26 | | |
33 | 27 | | |
| |||
52 | 46 | | |
53 | 47 | | |
54 | 48 | | |
| 49 | + | |
55 | 50 | | |
56 | 51 | | |
57 | 52 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | 25 | | |
32 | 26 | | |
33 | 27 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | 25 | | |
29 | | - | |
30 | | - | |
31 | 26 | | |
32 | 27 | | |
33 | 28 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | 25 | | |
29 | | - | |
30 | | - | |
31 | 26 | | |
32 | 27 | | |
33 | 28 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | 25 | | |
29 | | - | |
30 | | - | |
31 | 26 | | |
32 | 27 | | |
33 | 28 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
7 | | - | |
8 | 6 | | |
9 | 7 | | |
10 | 8 | | |
11 | 9 | | |
12 | 10 | | |
| 11 | + | |
13 | 12 | | |
14 | 13 | | |
15 | 14 | | |
| |||
18 | 17 | | |
19 | 18 | | |
20 | 19 | | |
21 | | - | |
| 20 | + | |
22 | 21 | | |
23 | 22 | | |
24 | 23 | | |
| |||
31 | 30 | | |
32 | 31 | | |
33 | 32 | | |
34 | | - | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
35 | 44 | | |
36 | 45 | | |
37 | 46 | | |
| |||
49 | 58 | | |
50 | 59 | | |
51 | 60 | | |
| 61 | + | |
52 | 62 | | |
53 | 63 | | |
54 | 64 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
19 | 20 | | |
20 | | - | |
| 21 | + | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
| |||
30 | 31 | | |
31 | 32 | | |
32 | 33 | | |
33 | | - | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
34 | 45 | | |
35 | 46 | | |
36 | 47 | | |
| |||
48 | 59 | | |
49 | 60 | | |
50 | 61 | | |
| 62 | + | |
51 | 63 | | |
52 | 64 | | |
53 | 65 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
| |||
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
21 | | - | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
34 | | - | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
35 | 46 | | |
36 | 47 | | |
37 | 48 | | |
| |||
49 | 60 | | |
50 | 61 | | |
51 | 62 | | |
| 63 | + | |
52 | 64 | | |
53 | 65 | | |
54 | 66 | | |
0 commit comments