You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG
+14
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,17 @@
1
+
========
2
+
5.5.3
3
+
========
4
+
----------------
5
+
Bug Fixes & Enhancements
6
+
----------------
7
+
* BGEEmbeddings: The default pretrained model for BGEEmbeddings has been changed from "bge_base" to "bge_small_en_v1.5". Users relying on the old default will need to explicitly specify "bge_base" in the pretrained method.
8
+
* Added useCLSToken parameter to allow users to choose between CLS token pooling and attention-based average pooling for sentence embeddings.
9
+
* BGEEmbeddings: BGEEmbeddings now supports a `useCLSToken` parameter, which defaults to True. This affects the embedding calculation strategy. Existing users should verify their usage and potentially set this parameter explicitly.
10
+
* Added HasClsTokenProperties in Scala: Introduced the HasClsTokenProperties trait in Scala providing useCLSToken parameter functionality for relevant annotators.
11
+
* Fixing wrong padding in attention mask in `MPNet`, `BGE`, `E5`, `Mxbai`, `Nomic`, `SnowFlake`, and `UAE`. This resulted in wrong inference results in some cases and not equal to the ONNX version in transformers/sentence-transformers.
12
+
* Various Performance Optimizations: Multiple changes across different models (Albert, Bart, CLIP, CamemBert, ConvNextClassifier, DeBerta, DistilBert, E5, Instructor, MPNet, Mxbai, Nomic, RoBerta, SnowFlake, UAE, ViTClassifier, VisionEncoderDecoder, Wav2Vec2, XlmRoBertaClassification, XlmRoberta) appear to focus on performance improvements and code cleanup, especially related to OpenVINO and ONNX inference. These may lead to faster inference times.
@@ -250,7 +250,7 @@ In Spark NLP we can define S3 locations to:
250
250
251
251
Please check [these instructions](https://sparknlp.org/docs/en/install#s3-integration) from our official documentation.
252
252
253
-
## Document5.5.2
253
+
## Document5.5.3
254
254
255
255
### Examples
256
256
@@ -283,7 +283,7 @@ the Spark NLP library:
283
283
keywords = {Spark, Natural language processing, Deep learning, Tensorflow, Cluster},
284
284
abstract = {Spark NLP is a Natural Language Processing (NLP) library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines that can scale easily in a distributed environment. Spark NLP comes with 1100+ pretrained pipelines and models in more than 192+ languages. It supports nearly all the NLP tasks and modules that can be used seamlessly in a cluster. Downloaded more than 2.7 million times and experiencing 9x growth since January 2020, Spark NLP is used by 54% of healthcare organizations as the world’s most widely used NLP library in the enterprise.}
0 commit comments