Releases: MinishLab/model2vec
Releases · MinishLab/model2vec
0.4.0
What's Changed
- Add fittable by @stephantul in #140
- fix scores in readme by @stephantul in #179
- docs: Refactored main docs, added separate docs directory, added training docs by @Pringled in #181
- docs: Update README.md by @Pringled in #183
- Update README.md by @Pringled in #184
- feat: replace 8m by 32m for training by @stephantul in #182
- docs: update scores in README by @stephantul in #186
- docs: Moved training results to results directory, updated docs and description by @Pringled in #187
- Bump version by @Pringled in #188
Full Changelog: v0.3.9...0.4.0
v0.3.9
What's Changed
- docs: Added new model results by @Pringled in #167
- docs: Update plot by @Pringled in #169
- feat: add trust-remote-code option by @stephantul in #173
- feat: Add SIF-like coef by @stephantul in #174
- increase version by @stephantul in #176
Full Changelog: v0.3.8...v0.3.9
v0.3.8
What's Changed
- docs: fix docstrings in distill by @stephantul in #157
- remove unnecessary import by @stephantul in #161
- remove deduplication tutorial by @stephantul in #159
- fix: issue with modernbert tokenizer, add token pattern to _distill by @stephantul in #158
- fix: fix typing issue by @stephantul in #162
- feat: float pca dims by @stephantul in #163
- feat: Add optional embedding normalization to StaticModel loading by @davidberenstein1957 in #164
- feat: Improve distill for modernBERT by @stephantul in #165
- increase version by @stephantul in #166
New Contributors
- @davidberenstein1957 made their first contribution in #164
Full Changelog: v0.3.7...v0.3.8
v0.3.7
v0.3.6
What's Changed
- Add loading from st by @stephantul in #151
- Bump version by @Pringled in #152
Full Changelog: v0.3.5...v0.3.6
v0.3.5
v0.3.4
What's Changed
- docs: Add txtai integration docs by @Pringled in #130
- docs: Reworked documentation by @Pringled in #131
- feat: Added semantic chunking with chonkie tutorial by @Pringled in #133
- feat: Updated config values by @Pringled in #136
- feat: add support for pattern for unused tokens. by @stephantul in #138
- feat: Add multiprocessing by @Pringled in #141 (suggested by davidmezzetti in #139)
- feat: Added multiprocessing threshold parameter by @Pringled in #142
- docs: Add langchain example by @Pringled in #143
- fix: Removed unneeded tokenize call by @Pringled in #144
- docs: update README.md by @eltociear in #145
- Bump version by @Pringled in #146
New Contributors
- @eltociear made their first contribution in #145
Full Changelog: v0.3.3...v0.3.4
v0.3.3
What's Changed
- feat: Added onnx and tokenizer files support script by @Pringled in #119
- docs: Update readme by @Pringled in #122
- fix: Fixed CI by @Pringled in #124
- docs: Updated results table by @Pringled in #125
- docs: Updated slogan by @Pringled in #127
- fix: Added jinja2 requirement by @Pringled in #128
- Bumped version by @Pringled in #129
Full Changelog: v0.3.2...v0.3.3
v0.3.2
v0.3.1
What's Changed
- fix: update added tokens to be more agnostic by @stephantul in #107
- fix: don't rely on reported vocab size, log warning if inconsistent by @stephantul in #109
- docs: Fixed broken links by @Pringled in #112
- feat: make encode_batch_fast optional by @stephantul in #113
- fix: normalize would lead to NaN for empty docs by @stephantul in #114
- docs: Add tokenlearn results by @Pringled in #116
- docs: Updated plot by @Pringled in #117
- Bump version by @Pringled in #118
Full Changelog: v0.3.0...v0.3.1