What's new
Added 🎉
- Documentation to the README stating the use case of My-Binder and how to install pymusas within the My-Binder cloud environment.
How-toguide for theneuralandhybridtaggers withindocs/docs/usage/how_to/tag_text_withas well introduced these taggers and how they compare to one another indocs/docs/usage/getting_started/intro.md.- Added resource requirement benchmarking code that can be found in the directory
benchmarks/resource_benchmarking. This code creates a markdown table with statistics on how much memory is required to run the different taggers for both RAM and GPU memory as well as how fast the taggers are using either the CPU or GPU. These resource requirement statistics have also been added to the documentation within theIntroductionusage page (docs/docs/usage/getting_started/intro.md). - Added documentation to the documentation website (
docs/docs/usage/how_to/tag_text_with/neural_tagger.md) on how to efficiently process texts with the neural and hybrid taggers.
Changed ⚠️
- Moved the
How-toRule Based Taggerusage documentation page from the directorydocs/docs/usage/how_totodocs/docs/usage/how_to/tag_text_withso that all the tagger how to guides are within their own folder. - Support both transformers v4 and v5.
- Support for Python 3.14 should be resolved by pinning the version of spaCy to
>=3.8.13this fixes Issue 57. - Updated the GitHub Workflows so that they use a tagged version of each action and that the actions are the most recent versions.
- The version of PyTorch that is used in the code base when developing is set to the CPU version saving Linux users from downloading CUDA libraries.
- The GPU docker image used for testing has been updated so that it uses the correct version of PyTorch.
- Corrected how to install
pymusasfor users in the ./README.md. - Changed the default version of Python to version
3.13.
Removed 🗑
- Removed old README (
old_readme_information.md) and the benchmarking code withinbenchmarksthat used a pre version0.3.0ofpymusas, this benchmarking code has now been replaced with the code withinbenchmarks/resource_benchmarking.
Commits
3e39c65 Fixed tomllib error with python version 3.10
55eccd7 Fixed syntax error
6ced8e2 Updated transformers to >4 and fixed #57
2ae88d8 Updated Danish tagger
b5e184c Update installation.md (#58)
53121ff References to the neural and hybrid arxiv paper
fdcf06d Added a note about python 3.14 support wrt spaCy and pydantic v1 #57 in documentation
2220c74 Guide on how to tag long or large text efficiently for the neural tagger
e323a97 MyPy linting fixes for benchmarks directory
39e4772 Fixed and included benchmarks within the linting directories that are checked
552fb18 Added the dev requirements that came from resource requirements benchmarking
718fe9a Resource requirements benchmarking
b7eef5f Added the correct date
9afedc5 Added relevant imports to doc string in documentation for hybrid taggers
25db57c Updated to reflect the new taggers
4a9a57f Added a note about the Z99 and PUNCT tags
3a7852d Hybrid tagger how to guide
413be09 Neural Tagger how to guide
c6c636f Introduced the neural and hybrid tagger
604b9c9 File based linking
d457b0a Changed from models to taggers
fc934f8 Updated to v0.4.0 in the usage guide
98a437b How to use My-Binder