You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG
+31
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,34 @@
1
+
========
2
+
4.2.3
3
+
========
4
+
----------------
5
+
New Features & Enhancements
6
+
----------------
7
+
* Implement a new control over number of accepted columns in Python. This will sync the behavior between Scala and Python where user sets more columns than allowed inside setInputCols
8
+
* Adding metadata sentence key parameter in order to select which metadata field to use as sentence for CoNLLGenerator annotator
9
+
* Include escaping in CoNLLGenerator annotator when writing to csv and preserve special char tokens
10
+
* Add documentation for new `IAnnotation` feature for Scala users
11
+
* Add rules and delimiter parameters to RegexMatcher annotator to support string as input in addition to a file
* Fix NotSerializableException when WordEmbeddings is used over K8s cluster while `setEnableInMemoryStorage` is set to `true`
26
+
* Fix a bug in RegexTokenizer annotator when it outputs the wrong indexes if the pattern includes splits that are not followed by a space
27
+
* Fix training modul failing on EMR due to a bad Apache Spark version detection. The following classes were fixed: `CoNLL()`, `CoNLLU()`, `POS()`, and `PubTator()`
28
+
* Fix a bug in CoNLLGenerator annotator where token has non-int metadata
29
+
* Fix the wrong SentencePiece model's name required for DeBertaForQuestionAnswering and DeBertaEmbeddings when importing models
30
+
* Fix `NaNs` result in some ViTForImageClassification models/pipelines
[Spark NLP quick start on Google Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/quick_start_google_colab.ipynb) is a live demo on Google Colab that performs named entity recognitions and sentiment analysis by using Spark NLP pretrained pipelines.
@@ -673,7 +673,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
673
673
# -s is for spark-nlp
674
674
# -g will enable upgrading libcudnn8 to 8.1.0 on Kaggle for GPU usage
[Spark NLP quick start on Kaggle Kernel](https://www.kaggle.com/mozzie/spark-nlp-named-entity-recognition) is a live demo on Kaggle Kernel that performs named entity recognitions by using Spark NLP pretrained pipeline.
@@ -691,9 +691,9 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
691
691
692
692
3. In `Libraries` tab inside your cluster you need to follow these steps:
693
693
694
-
3.1. Install New -> PyPI ->`spark-nlp==4.2.2` -> Install
694
+
3.1. Install New -> PyPI ->`spark-nlp==4.2.3` -> Install
695
695
696
-
3.2. Install New -> Maven -> Coordinates ->`com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.2` -> Install
696
+
3.2. Install New -> Maven -> Coordinates ->`com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.3` -> Install
697
697
698
698
4. Now you can attach your notebook to the cluster and use Spark NLP!
699
699
@@ -741,7 +741,7 @@ A sample of your software configuration in JSON on S3 (must be public access):
- You can download provided Fat JARs from each [release notes](https://github.com/JohnSnowLabs/spark-nlp/releases), please pay attention to pick the one that suits your environment depending on the device (CPU/GPU) and Apache Spark version (3.0.x, 3.1.x, 3.2.x, and 3.3.x)
1152
-
- If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. (i.e., `hdfs:///tmp/spark-nlp-assembly-4.2.2.jar`)
1152
+
- If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. (i.e., `hdfs:///tmp/spark-nlp-assembly-4.2.3.jar`)
1153
1153
1154
1154
Example of using pretrained Models and Pipelines in offline:
0 commit comments