Commit 29ef26e
Relation extraction llama (CogStack/MedCAT#522)
* Added files.
* More additions to rel extraction.
* Rel base.
* Update.
* Updates.
* Dependency parsing.
* Updates.
* Added pre-training steps.
* Added training & model utils.
* Cleanup & fixes.
* Update.
* Evaluation updates for pretraining.
* Removed duplicate relation storage.
* Moved RE model file location.
* Structure revisions.
* Added custom config for RE.
* Implemented custom dataset loader for RE.
* More changes.
* Small fix.
* Latest additions to RelCAT (pipe + predictions)
* Setup.py fix.
* RE utils update.
* rel model update.
* rel dataset + tokenizer improvements.
* RelCAT updates.
* RelCAT saving/loading improvements.
* RelCAT saving/loading improvements.
* RelCAT model fixes.
* Attempted gpu learning fix. Dataset label generation fixes.
* Minor train dataset gen fix.
* Minor train dataset gen fix No.2.
* Config updates.
* Gpu support fixes. Added label stats.
* Evaluation stat fixes.
* Cleaned stat output mode during training.
* Build fix.
* removed unused dependencies and fixed code formatting
* Mypy compliance.
* Fixed linting.
* More Gpu mode train fixes.
* Fixed model saving/loading issues when using other baes models.
* More fixes to stat evaluation. Added proper CAT integration of RelCAT.
* Setup.py typo fix.
* RelCAT loading fix.
* RelCAT Config changes.
* Type fix. Minor additions to RelCAT model.
* Type fixes.
* Type corrections.
* RelCAT update.
* Type fixes.
* Fixed type issue.
* RelCATConfig: added seed param.
* Adaptations to the new codebase + type fixes..
* Doc/type fixes.
* Fixed input size issue for model.
* Fixed issue(s) with model size and config.
* RelCAT: updated configs to new style.
* RelCAT: removed old refs to logging.
* Fixed GPU training + added extra stat print for train set.
* Type fixes.
* Updated dev requirements.
* Linting.
* Fixed pin_memory issue when training on CPU.
* Updated RelCAT dataset get + default config.
* Updated RelDS generator + default config
* Linting.
* Updated RelDatset + config.
* Pushing updates to model
Made changes to:
1) Extracting given number of context tokens left and right of the entities
2) Extracting hidden state from bert for all the tokens of the entities and performing max pooling on them
* Fixing formatting
* Update rel_dataset.py
* Update rel_dataset.py
* Update rel_dataset.py
* RelCAT: added test resource files.
* RelCAT: Fixed model load/checkpointing.
* RelCAT: updated to pipe spacy doc call.
* RelCAT: added tests.
* Fixed lint/type issues & added rel tag to test DS.
* Fixed ann id to token issue.
* RelCAT: updated test dataset + tests.
* RelCAT: updates to requested changes + dataset improvements.
* RelCAT: updated docs/logs according to commends.
* RelCAT: type fix.
* RelCAT: mct export dataset updates.
* RelCAT: test updates + requested changes p2.
* RelCAT: log for MCT export train.
* Updated docs + split train_test & dataset for benchmarks.
* type fixes.
* RelCAT: Initial Llama integration.
* RelCAT: updates to Llama impl.
* RelCAT: model typo fix.
* RelCAT: label_id /sample no. mixup fix.
* Updated cleaned up Relataset, added new ways to create relations via anno types (doc/export only for now).
* Added option to predict any text /w annotations via RelCAT. MCT export train fixes.
* RelCAT: added sample limiter / class, more logging info.
* RelCAT: test/train ds shuffle update.
* RelCAT: added option to keep original text when using reldataset class.
* Pushing change for stratified batching
Implement stratified batching for improved class representation and balanced training
* RelCAT: fixed doc processing issue + class weights.
* RelCAT: class weights addtions to cfg + param.
* RelCAT: added config params for Adam optimizer.
* RelCAT updated default config.
* RelCAT: config update + optimizer change.
* RelCAT: fixed model freeze flags.
* RelCAT: model optimizer save/load fix.
* RelCAT: added export ent tag check.
* Fixed issues when saving/loading model for class weights + inference device cast.
* RelCAT: bug fix for ents that are @ EoS.
* Rel Dataset updates.
* Rel Dataset updates.
* Pushing change for ModernBERT
* Bumped transformers version.
* Updated rel dataset generation from fake Spacy Docs.
* ModernBert updates.
* Updated RelCAT model-load/save.
* Minor relCAT updates, code format.
* Type check updates.
* Fixed inference issue.
* RelCAT: testing updates.
* Type fixes.
* Type fixes.
* Type fixes.
* Type fixes IV.
* Type fixes python 3.9.
* RelCAT: flake8 fixes.
* RelCAT: flake8 fixes.
* RelCAT: Updates (fixed model loading after save).
* Fixed test.
* Update RelCAT stuff for improved abstraction
* Move separate model implementations to separate packages
* Some minor abstraction changes
* Remove accidentally copied abstract method decorator
* Fix import in test
* Fix RelCAT impport in pipe tests
* Update base relcat model implementation to include config
* Latest RelCAT module updates.
* Type fixes + run issues.
* Type fixes.
* Fixed Llama tokenizer.
* Type fixes.
* Type fixes: Python3.10 adjustements.
* Linting.
* Fix base flake8 lint issues
* Fix doc string in ConfigRelCAT.load
* Fix base component init doc string
* Fixed BaseComponent.load method doc string
* Fix doc strings in rel_cat ml_utils
* Fix doc strings in rel_cat models module
* Fix rel-cat test time import
* Fix type casting
* Align pipe tests with rel cat changes
* Fix property paths in rel cat tests
* Updates.
* Fixed tests.
* Fixed relCAT config save.
* Latest fixes for model saving/loading.
* Lint fix.
* RelCAT cfg load test fix.
* Remove install requirements from gitignore
---------
Co-authored-by: Shubham Agarwal <66172189+shubham-s-agarwal@users.noreply.github.com>
Co-authored-by: mart-r <mart.ratas@gmail.com>1 parent 6726789 commit 29ef26e
File tree
27 files changed
+2038
-1036
lines changed- medcat-v1
- medcat
- utils
- meta_cat
- relation_extraction
- bert
- llama
- modernbert
- tests
27 files changed
+2038
-1036
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
| 58 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
143 | 143 | | |
144 | 144 | | |
145 | 145 | | |
146 | | - | |
| 146 | + | |
147 | 147 | | |
148 | 148 | | |
149 | 149 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
1 | 2 | | |
2 | | - | |
| 3 | + | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
| |||
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
24 | | - | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
25 | 28 | | |
26 | | - | |
27 | | - | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
28 | 33 | | |
29 | 34 | | |
30 | 35 | | |
| |||
46 | 51 | | |
47 | 52 | | |
48 | 53 | | |
49 | | - | |
| 54 | + | |
50 | 55 | | |
51 | | - | |
52 | | - | |
53 | | - | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
54 | 72 | | |
| 73 | + | |
| 74 | + | |
55 | 75 | | |
56 | 76 | | |
57 | 77 | | |
58 | | - | |
59 | 78 | | |
60 | 79 | | |
61 | | - | |
| 80 | + | |
62 | 81 | | |
63 | | - | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
64 | 95 | | |
65 | 96 | | |
66 | 97 | | |
| |||
82 | 113 | | |
83 | 114 | | |
84 | 115 | | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
85 | 119 | | |
86 | 120 | | |
87 | 121 | | |
88 | 122 | | |
89 | 123 | | |
90 | 124 | | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
91 | 128 | | |
92 | 129 | | |
93 | 130 | | |
| |||
98 | 135 | | |
99 | 136 | | |
100 | 137 | | |
| 138 | + | |
101 | 139 | | |
| 140 | + | |
102 | 141 | | |
103 | | - | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
104 | 156 | | |
105 | 157 | | |
106 | 158 | | |
| |||
109 | 161 | | |
110 | 162 | | |
111 | 163 | | |
112 | | - | |
| 164 | + | |
| 165 | + | |
113 | 166 | | |
114 | 167 | | |
115 | 168 | | |
| |||
129 | 182 | | |
130 | 183 | | |
131 | 184 | | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
0 commit comments