@@ -152,7 +152,7 @@ To use Spark NLP you need the following requirements:
152
152
153
153
** GPU (optional):**
154
154
155
- Spark NLP 4.2.1 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support:
155
+ Spark NLP 4.2.2 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support:
156
156
157
157
- NVIDIA® GPU drivers version 450.80.02 or higher
158
158
- CUDA® Toolkit 11.2
@@ -168,7 +168,7 @@ $ java -version
168
168
$ conda create -n sparknlp python=3.7 -y
169
169
$ conda activate sparknlp
170
170
# spark-nlp by default is based on pyspark 3.x
171
- $ pip install spark-nlp==4.2.1 pyspark==3.2.1
171
+ $ pip install spark-nlp==4.2.2 pyspark==3.2.1
172
172
```
173
173
174
174
In Python console or Jupyter ` Python3 ` kernel:
@@ -213,7 +213,7 @@ For more examples, you can visit our dedicated [repository](https://github.com/J
213
213
214
214
## Apache Spark Support
215
215
216
- Spark NLP * 4.2.1 * has been built on top of Apache Spark 3.2 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x:
216
+ Spark NLP * 4.2.2 * has been built on top of Apache Spark 3.2 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x:
217
217
218
218
| Spark NLP | Apache Spark 2.3.x | Apache Spark 2.4.x | Apache Spark 3.0.x | Apache Spark 3.1.x | Apache Spark 3.2.x | Apache Spark 3.3.x |
219
219
| -----------| --------------------| --------------------| --------------------| --------------------| --------------------| --------------------|
@@ -247,7 +247,7 @@ Find out more about `Spark NLP` versions from our [release notes](https://github
247
247
248
248
## Databricks Support
249
249
250
- Spark NLP 4.2.1 has been tested and is compatible with the following runtimes:
250
+ Spark NLP 4.2.2 has been tested and is compatible with the following runtimes:
251
251
252
252
** CPU:**
253
253
@@ -288,7 +288,7 @@ NOTE: Spark NLP 4.0.x is based on TensorFlow 2.7.x which is compatible with CUDA
288
288
289
289
## EMR Support
290
290
291
- Spark NLP 4.2.1 has been tested and is compatible with the following EMR releases:
291
+ Spark NLP 4.2.2 has been tested and is compatible with the following EMR releases:
292
292
293
293
- emr-6.2.0
294
294
- emr-6.3.0
@@ -326,23 +326,23 @@ Spark NLP supports all major releases of Apache Spark 3.0.x, Apache Spark 3.1.x,
326
326
``` sh
327
327
# CPU
328
328
329
- spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.1
329
+ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.2
330
330
331
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.1
331
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.2
332
332
333
- spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.1
333
+ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.2
334
334
```
335
335
336
336
The ` spark-nlp ` has been published to the [ Maven Repository] ( https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp ) .
337
337
338
338
``` sh
339
339
# GPU
340
340
341
- spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.1
341
+ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.2
342
342
343
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.1
343
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.2
344
344
345
- spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.1
345
+ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.2
346
346
347
347
```
348
348
@@ -351,11 +351,11 @@ The `spark-nlp-gpu` has been published to the [Maven Repository](https://mvnrepo
351
351
``` sh
352
352
# AArch64
353
353
354
- spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.1
354
+ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.2
355
355
356
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.1
356
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.2
357
357
358
- spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.1
358
+ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.2
359
359
360
360
```
361
361
@@ -364,11 +364,11 @@ The `spark-nlp-aarch64` has been published to the [Maven Repository](https://mvn
364
364
``` sh
365
365
# M1
366
366
367
- spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.1
367
+ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.2
368
368
369
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.1
369
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.2
370
370
371
- spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.1
371
+ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.2
372
372
373
373
```
374
374
@@ -380,7 +380,7 @@ The `spark-nlp-m1` has been published to the [Maven Repository](https://mvnrepos
380
380
spark-shell \
381
381
--driver-memory 16g \
382
382
--conf spark.kryoserializer.buffer.max=2000M \
383
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.1
383
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.2
384
384
```
385
385
386
386
## Scala
@@ -396,7 +396,7 @@ Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, 3.2
396
396
<dependency >
397
397
<groupId >com.johnsnowlabs.nlp</groupId >
398
398
<artifactId >spark-nlp_2.12</artifactId >
399
- <version >4.2.1 </version >
399
+ <version >4.2.2 </version >
400
400
</dependency >
401
401
```
402
402
@@ -407,7 +407,7 @@ Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, 3.2
407
407
<dependency >
408
408
<groupId >com.johnsnowlabs.nlp</groupId >
409
409
<artifactId >spark-nlp-gpu_2.12</artifactId >
410
- <version >4.2.1 </version >
410
+ <version >4.2.2 </version >
411
411
</dependency >
412
412
```
413
413
@@ -418,7 +418,7 @@ Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, 3.2
418
418
<dependency >
419
419
<groupId >com.johnsnowlabs.nlp</groupId >
420
420
<artifactId >spark-nlp-aarch64_2.12</artifactId >
421
- <version >4.2.1 </version >
421
+ <version >4.2.2 </version >
422
422
</dependency >
423
423
```
424
424
@@ -429,7 +429,7 @@ Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, 3.2
429
429
<dependency >
430
430
<groupId >com.johnsnowlabs.nlp</groupId >
431
431
<artifactId >spark-nlp-m1_2.12</artifactId >
432
- <version >4.2.1 </version >
432
+ <version >4.2.2 </version >
433
433
</dependency >
434
434
```
435
435
@@ -439,28 +439,28 @@ Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, 3.2
439
439
440
440
``` sbtshell
441
441
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
442
- libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "4.2.1 "
442
+ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "4.2.2 "
443
443
```
444
444
445
445
** spark-nlp-gpu:**
446
446
447
447
``` sbtshell
448
448
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu
449
- libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "4.2.1 "
449
+ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "4.2.2 "
450
450
```
451
451
452
452
** spark-nlp-aarch64:**
453
453
454
454
``` sbtshell
455
455
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-aarch64
456
- libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "4.2.1 "
456
+ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "4.2.2 "
457
457
```
458
458
459
459
** spark-nlp-m1:**
460
460
461
461
``` sbtshell
462
462
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-m1
463
- libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-m1" % "4.2.1 "
463
+ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-m1" % "4.2.2 "
464
464
```
465
465
466
466
Maven Central: [ https://mvnrepository.com/artifact/com.johnsnowlabs.nlp ] ( https://mvnrepository.com/artifact/com.johnsnowlabs.nlp )
@@ -480,7 +480,7 @@ If you installed pyspark through pip/conda, you can install `spark-nlp` through
480
480
Pip:
481
481
482
482
``` bash
483
- pip install spark-nlp==4.2.1
483
+ pip install spark-nlp==4.2.2
484
484
```
485
485
486
486
Conda:
@@ -508,7 +508,7 @@ spark = SparkSession.builder \
508
508
.config(" spark.driver.memory" ," 16G" )\
509
509
.config(" spark.driver.maxResultSize" , " 0" ) \
510
510
.config(" spark.kryoserializer.buffer.max" , " 2000M" )\
511
- .config(" spark.jars.packages" , " com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.1 " )\
511
+ .config(" spark.jars.packages" , " com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.2 " )\
512
512
.getOrCreate()
513
513
```
514
514
@@ -576,7 +576,7 @@ Use either one of the following options
576
576
- Add the following Maven Coordinates to the interpreter's library list
577
577
578
578
``` bash
579
- com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.1
579
+ com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.2
580
580
```
581
581
582
582
- Add a path to pre-built jar from [ here] ( #compiled-jars ) in the interpreter's library list making sure the jar is available to driver path
@@ -586,7 +586,7 @@ com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.1
586
586
Apart from the previous step, install the python module through pip
587
587
588
588
``` bash
589
- pip install spark-nlp==4.2.1
589
+ pip install spark-nlp==4.2.2
590
590
```
591
591
592
592
Or you can install ` spark-nlp ` from inside Zeppelin by using Conda:
@@ -611,7 +611,7 @@ The easiest way to get this done on Linux and macOS is to simply install `spark-
611
611
$ conda create -n sparknlp python=3.8 -y
612
612
$ conda activate sparknlp
613
613
# spark-nlp by default is based on pyspark 3.x
614
- $ pip install spark-nlp==4.2.1 pyspark==3.2.1 jupyter
614
+ $ pip install spark-nlp==4.2.2 pyspark==3.2.1 jupyter
615
615
$ jupyter notebook
616
616
```
617
617
@@ -627,7 +627,7 @@ export PYSPARK_PYTHON=python3
627
627
export PYSPARK_DRIVER_PYTHON=jupyter
628
628
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
629
629
630
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.1
630
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.2
631
631
```
632
632
633
633
Alternatively, you can mix in using ` --jars ` option for pyspark + ` pip install spark-nlp `
@@ -652,7 +652,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
652
652
# -s is for spark-nlp
653
653
# -g will enable upgrading libcudnn8 to 8.1.0 on Google Colab for GPU usage
654
654
# by default they are set to the latest
655
- ! wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.1 -s 4.2.1
655
+ ! wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.1 -s 4.2.2
656
656
```
657
657
658
658
[ Spark NLP quick start on Google Colab] ( https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/quick_start_google_colab.ipynb ) is a live demo on Google Colab that performs named entity recognitions and sentiment analysis by using Spark NLP pretrained pipelines.
@@ -673,7 +673,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
673
673
# -s is for spark-nlp
674
674
# -g will enable upgrading libcudnn8 to 8.1.0 on Kaggle for GPU usage
675
675
# by default they are set to the latest
676
- ! wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.1 -s 4.2.1
676
+ ! wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.1 -s 4.2.2
677
677
```
678
678
679
679
[ Spark NLP quick start on Kaggle Kernel] ( https://www.kaggle.com/mozzie/spark-nlp-named-entity-recognition ) is a live demo on Kaggle Kernel that performs named entity recognitions by using Spark NLP pretrained pipeline.
@@ -691,9 +691,9 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
691
691
692
692
3. In ` Libraries` tab inside your cluster you need to follow these steps:
693
693
694
- 3.1. Install New -> PyPI -> ` spark-nlp==4.2.1 ` -> Install
694
+ 3.1. Install New -> PyPI -> ` spark-nlp==4.2.2 ` -> Install
695
695
696
- 3.2. Install New -> Maven -> Coordinates -> ` com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.1 ` -> Install
696
+ 3.2. Install New -> Maven -> Coordinates -> ` com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.2 ` -> Install
697
697
698
698
4. Now you can attach your notebook to the cluster and use Spark NLP!
699
699
@@ -741,7 +741,7 @@ A sample of your software configuration in JSON on S3 (must be public access):
741
741
"spark.kryoserializer.buffer.max": "2000M",
742
742
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
743
743
"spark.driver.maxResultSize": "0",
744
- "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.1 "
744
+ "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.2 "
745
745
}
746
746
}]
747
747
```
@@ -750,7 +750,7 @@ A sample of AWS CLI to launch EMR cluster:
750
750
751
751
```.sh
752
752
aws emr create-cluster \
753
- --name "Spark NLP 4.2.1 " \
753
+ --name "Spark NLP 4.2.2 " \
754
754
--release-label emr-6.2.0 \
755
755
--applications Name=Hadoop Name=Spark Name=Hive \
756
756
--instance-type m4.4xlarge \
@@ -814,7 +814,7 @@ gcloud dataproc clusters create ${CLUSTER_NAME} \
814
814
--enable-component-gateway \
815
815
--metadata ' PIP_PACKAGES=spark-nlp spark-nlp-display google-cloud-bigquery google-cloud-storage' \
816
816
--initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/python/pip-install.sh \
817
- --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.1
817
+ --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.2
818
818
```
819
819
820
820
2. On an existing one, you need to install spark-nlp and spark-nlp-display packages from PyPI.
@@ -853,7 +853,7 @@ spark = SparkSession.builder \
853
853
.config("spark.kryoserializer.buffer.max", "2000m") \
854
854
.config("spark.jsl.settings.pretrained.cache_folder", "sample_data/pretrained") \
855
855
.config("spark.jsl.settings.storage.cluster_tmp_dir", "sample_data/storage") \
856
- .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.1 ") \
856
+ .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.2 ") \
857
857
.getOrCreate()
858
858
```
859
859
@@ -867,7 +867,7 @@ spark-shell \
867
867
--conf spark.kryoserializer.buffer.max=2000M \
868
868
--conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
869
869
--conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
870
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.1
870
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.2
871
871
```
872
872
873
873
**pyspark:**
@@ -880,7 +880,7 @@ pyspark \
880
880
--conf spark.kryoserializer.buffer.max=2000M \
881
881
--conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
882
882
--conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
883
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.1
883
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.2
884
884
```
885
885
886
886
**Databricks:**
@@ -1144,12 +1144,12 @@ spark = SparkSession.builder \
1144
1144
.config("spark.driver.memory","16G")\
1145
1145
.config("spark.driver.maxResultSize", "0") \
1146
1146
.config("spark.kryoserializer.buffer.max", "2000M")\
1147
- .config("spark.jars", "/tmp/spark-nlp-assembly-4.2.1 .jar")\
1147
+ .config("spark.jars", "/tmp/spark-nlp-assembly-4.2.2 .jar")\
1148
1148
.getOrCreate()
1149
1149
```
1150
1150
1151
1151
- You can download provided Fat JARs from each [release notes](https://github.com/JohnSnowLabs/spark-nlp/releases), please pay attention to pick the one that suits your environment depending on the device (CPU/GPU) and Apache Spark version (3.0.x, 3.1.x, 3.2.x, and 3.3.x)
1152
- - If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. (i.e., `hdfs:///tmp/spark-nlp-assembly-4.2.1 .jar`)
1152
+ - If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. (i.e., `hdfs:///tmp/spark-nlp-assembly-4.2.2 .jar`)
1153
1153
1154
1154
Example of using pretrained Models and Pipelines in offline:
1155
1155
0 commit comments