Releases: UCREL/WSD-Torch-Models
v0.1.3
What's new
Added 🎉
- Added the arXiv paper to the PyMUSAS BEM model readme,
model_readmes/pymusas_bem.md, of which this did require the Bib text to be Python string variable in the convert and upload scriptscripts/convert_and_upload_bem_model.py. - Created ./benchmarks/ to benchmark speed and memory performance.
- Created ./tests/functional_tests/ to test the whole function of the package, an end to end test of the package. This is mainly to ensure that any changes to the code base does not affect the performance with respect to the accuracy of the existing models.
Changed ⚠️
- Changed the ./pyproject.toml so that local developers can easily install different versions of torch, i.e. cpu or different cuda versions.
- Updated the ./.github/workflows to use specific GitHub action versions, this should make the workflow more secure.
- Updated the ./.devcontainer files so that they use the correct version of torch.
- Changed the developer tools from
isort,flake8,mypytoruffandty.
Commits
3c09f93 Prepare for release v0.1.3
11be7ed Upgraded to tranformers >=4.54.0,<6.0
37ec57a Results for CPU
e71b992 Changed to RUFF and TY
13911bd Benchmark dependencies are now in their own optional group
f4fe491 Reset model cache
a2794b8 Benchmarks for end to end testing
60ad96b Fixed linting issue
85814b4 Fixed torch development versions
9736910 Added arXiv paper citation to the PyMUSAS BEM models
dcb92e6 Bash script to automate uploading model checkpoints as branch to HF repo
f08b007 Added link to v0.1.2 in CHANGELOG.md
v0.1.2
What's new
Changed ⚠️
- The version of
numpyhas been relaxed fromnumpy>=2.0.0,<3.0tonumpy>=1.19.0,<3.0so that we can use the GPU within a spacy pipeline due to spacy's dependency on cupy versioncupy-cuda12x>=11.5.0,<13.0.0.
Commits
8d8b133 Relaxed the version of numpy
3150049 Added link to release in CHANGELOG.md
v0.1.1
What's new
Added 🎉
tokenizer_kwargsoptional argument to thewsd_torch_models.bem.BEM.predictmethod. This allows users to define key word arguments that can be passed to the sub word tokenizer that is downloaded from HuggingFace throughtransformers.AutoTokenizer.from_pretrained.- Added a
ValueErrorthat is raised within thewsd_torch_models.bem.BEM.predictwhen the number of predicted sense labels does not equal the number of tokens that were given that should have a predicted sense label. - Added
add_prefix_space=Trueargument to theAutoTokenizer.from_pretrainedmethod for all examples in theREADME.md,scripts/convert_and_upload_bem_model.py, andmodel_readmes/pymusas_bem.md. This is required as this is what the pre-trainedBEMmodels expect. - The devcontainers, found in
.devcontainer, have been improved so that they use the cached uv packages that have been installed at docker build time.
Commits
0bdd512 Correct file path to model checkpoint for English Small BEM
846ab0a Example using the larger model
86cb8b1 Correct spelling error of Engish on the model names
v0.1.0
What's new
Added 🎉
- First release.
- The Bi-Encoder Model (BEM) from the paper Moving Down the Long Tail of Word Sense Disambiguation with Gloss Informed Bi-encoders. This model can be found at
wsd_torch_models.bem.BEM - The
wsd_torch_models.bem.BEMclass represents a good potential blueprint (abstract class) for other Word Sense Disambiguation methods to inherit from in the future through a parent class. - Created a script,
scripts/convert_and_upload_bem_model.py, that converts Pytorch Lightning models that thewsd_torch_models.bem.BEMclass was created from to be converted into the Pytorch and PyTorchModelHubMixin class that thewsd_torch_models.bem.BEMclass represents without the need for Pytorch Lightning dependency. This script only requires the checkpoint from the saved Pytorch Lightning model and it will convert the model as well as upload it to the relevant HuggingFace hub repository.
Commits
5126326 Prepare for release v0.1.0
17ac105 Creation of release notes and scripts
68e8358 Preparing for first release
9de65a2 Adding all models to HuggingFace Hub
2bbfe29 more relevant naming
4d3a44f Added arguments to only update parts of the model
33800e8 Added arguments to only update parts of the model
855d634 Update README.md
99e085f Model inference code
56034cf End of day
dbd663a Installation
dae15cf USAS mapper
1777a47 Testing attention masking on forward pass
ce53edc Small test for the BEM model
ee956f8 Added doc strings
5a56b0c Number of parameters test
30c7354 Added pytests to the CI pipeline
1be0b9d Debugging saved files
141b5a0 Debugging saved files
842c234 Debugging saved files
b6625a6 Debugging saved files
ac4281a Debugging saved files
56d0076 Debugging saved files
dc06b7f Debugging saved files
af10ef5 Debugging saved files
19e78fe Debugging saved files
b96b36c Debugging saved files
06bcbf1 Debugging saved files
ce38168 Changed caching
82ec8db Model caching for testing
a84dce7 list files removed as it does not work on Windows
dd713df Removed scripts path from flake8 and isort
335bc74 Syntax error
e7d509c HF model caching
0c8dedd vscode
4682e73 CPU and GPU containers
743f19e end of day
df93fe5 CI
3327e7e Documentation
d38b5d0 Initital code
ff68f0f Project setup
312ca60 Dev Container setup