@@ -152,7 +152,7 @@ To use Spark NLP you need the following requirements:
152
152
153
153
** GPU (optional):**
154
154
155
- Spark NLP 4.2.7 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support:
155
+ Spark NLP 4.2.8 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support:
156
156
157
157
- NVIDIA® GPU drivers version 450.80.02 or higher
158
158
- CUDA® Toolkit 11.2
@@ -168,7 +168,7 @@ $ java -version
168
168
$ conda create -n sparknlp python=3.7 -y
169
169
$ conda activate sparknlp
170
170
# spark-nlp by default is based on pyspark 3.x
171
- $ pip install spark-nlp==4.2.7 pyspark==3.2.3
171
+ $ pip install spark-nlp==4.2.8 pyspark==3.2.3
172
172
```
173
173
174
174
In Python console or Jupyter ` Python3 ` kernel:
@@ -213,7 +213,7 @@ For more examples, you can visit our dedicated [repository](https://github.com/J
213
213
214
214
## Apache Spark Support
215
215
216
- Spark NLP * 4.2.7 * has been built on top of Apache Spark 3.2 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x:
216
+ Spark NLP * 4.2.8 * has been built on top of Apache Spark 3.2 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x:
217
217
218
218
| Spark NLP | Apache Spark 2.3.x | Apache Spark 2.4.x | Apache Spark 3.0.x | Apache Spark 3.1.x | Apache Spark 3.2.x | Apache Spark 3.3.x |
219
219
| -----------| --------------------| --------------------| --------------------| --------------------| --------------------| --------------------|
@@ -247,7 +247,7 @@ Find out more about `Spark NLP` versions from our [release notes](https://github
247
247
248
248
## Databricks Support
249
249
250
- Spark NLP 4.2.7 has been tested and is compatible with the following runtimes:
250
+ Spark NLP 4.2.8 has been tested and is compatible with the following runtimes:
251
251
252
252
** CPU:**
253
253
@@ -291,7 +291,7 @@ NOTE: Spark NLP 4.0.x is based on TensorFlow 2.7.x which is compatible with CUDA
291
291
292
292
## EMR Support
293
293
294
- Spark NLP 4.2.7 has been tested and is compatible with the following EMR releases:
294
+ Spark NLP 4.2.8 has been tested and is compatible with the following EMR releases:
295
295
296
296
- emr-6.2.0
297
297
- emr-6.3.0
@@ -329,23 +329,23 @@ Spark NLP supports all major releases of Apache Spark 3.0.x, Apache Spark 3.1.x,
329
329
``` sh
330
330
# CPU
331
331
332
- spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
332
+ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8
333
333
334
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
334
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8
335
335
336
- spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
336
+ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8
337
337
```
338
338
339
339
The ` spark-nlp ` has been published to the [ Maven Repository] ( https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp ) .
340
340
341
341
``` sh
342
342
# GPU
343
343
344
- spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.7
344
+ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.8
345
345
346
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.7
346
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.8
347
347
348
- spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.7
348
+ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.8
349
349
350
350
```
351
351
@@ -354,11 +354,11 @@ The `spark-nlp-gpu` has been published to the [Maven Repository](https://mvnrepo
354
354
``` sh
355
355
# AArch64
356
356
357
- spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.7
357
+ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.8
358
358
359
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.7
359
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.8
360
360
361
- spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.7
361
+ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.2.8
362
362
363
363
```
364
364
@@ -367,11 +367,11 @@ The `spark-nlp-aarch64` has been published to the [Maven Repository](https://mvn
367
367
``` sh
368
368
# M1
369
369
370
- spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.7
370
+ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.8
371
371
372
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.7
372
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.8
373
373
374
- spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.7
374
+ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.8
375
375
376
376
```
377
377
@@ -383,7 +383,7 @@ The `spark-nlp-m1` has been published to the [Maven Repository](https://mvnrepos
383
383
spark-shell \
384
384
--driver-memory 16g \
385
385
--conf spark.kryoserializer.buffer.max=2000M \
386
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
386
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8
387
387
```
388
388
389
389
## Scala
@@ -399,7 +399,7 @@ Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, 3.2
399
399
<dependency >
400
400
<groupId >com.johnsnowlabs.nlp</groupId >
401
401
<artifactId >spark-nlp_2.12</artifactId >
402
- <version >4.2.7 </version >
402
+ <version >4.2.8 </version >
403
403
</dependency >
404
404
```
405
405
@@ -410,7 +410,7 @@ Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, 3.2
410
410
<dependency >
411
411
<groupId >com.johnsnowlabs.nlp</groupId >
412
412
<artifactId >spark-nlp-gpu_2.12</artifactId >
413
- <version >4.2.7 </version >
413
+ <version >4.2.8 </version >
414
414
</dependency >
415
415
```
416
416
@@ -421,7 +421,7 @@ Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, 3.2
421
421
<dependency >
422
422
<groupId >com.johnsnowlabs.nlp</groupId >
423
423
<artifactId >spark-nlp-aarch64_2.12</artifactId >
424
- <version >4.2.7 </version >
424
+ <version >4.2.8 </version >
425
425
</dependency >
426
426
```
427
427
@@ -432,7 +432,7 @@ Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, 3.2
432
432
<dependency >
433
433
<groupId >com.johnsnowlabs.nlp</groupId >
434
434
<artifactId >spark-nlp-m1_2.12</artifactId >
435
- <version >4.2.7 </version >
435
+ <version >4.2.8 </version >
436
436
</dependency >
437
437
```
438
438
@@ -442,28 +442,28 @@ Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, 3.2
442
442
443
443
``` sbtshell
444
444
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
445
- libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "4.2.7 "
445
+ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "4.2.8 "
446
446
```
447
447
448
448
** spark-nlp-gpu:**
449
449
450
450
``` sbtshell
451
451
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu
452
- libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "4.2.7 "
452
+ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "4.2.8 "
453
453
```
454
454
455
455
** spark-nlp-aarch64:**
456
456
457
457
``` sbtshell
458
458
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-aarch64
459
- libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "4.2.7 "
459
+ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "4.2.8 "
460
460
```
461
461
462
462
** spark-nlp-m1:**
463
463
464
464
``` sbtshell
465
465
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-m1
466
- libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-m1" % "4.2.7 "
466
+ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-m1" % "4.2.8 "
467
467
```
468
468
469
469
Maven Central: [ https://mvnrepository.com/artifact/com.johnsnowlabs.nlp ] ( https://mvnrepository.com/artifact/com.johnsnowlabs.nlp )
@@ -483,7 +483,7 @@ If you installed pyspark through pip/conda, you can install `spark-nlp` through
483
483
Pip:
484
484
485
485
``` bash
486
- pip install spark-nlp==4.2.7
486
+ pip install spark-nlp==4.2.8
487
487
```
488
488
489
489
Conda:
@@ -511,7 +511,7 @@ spark = SparkSession.builder \
511
511
.config(" spark.driver.memory" ," 16G" )\
512
512
.config(" spark.driver.maxResultSize" , " 0" ) \
513
513
.config(" spark.kryoserializer.buffer.max" , " 2000M" )\
514
- .config(" spark.jars.packages" , " com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7 " )\
514
+ .config(" spark.jars.packages" , " com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8 " )\
515
515
.getOrCreate()
516
516
```
517
517
@@ -579,7 +579,7 @@ Use either one of the following options
579
579
- Add the following Maven Coordinates to the interpreter's library list
580
580
581
581
``` bash
582
- com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
582
+ com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8
583
583
```
584
584
585
585
- Add a path to pre-built jar from [ here] ( #compiled-jars ) in the interpreter's library list making sure the jar is available to driver path
@@ -589,7 +589,7 @@ com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
589
589
Apart from the previous step, install the python module through pip
590
590
591
591
``` bash
592
- pip install spark-nlp==4.2.7
592
+ pip install spark-nlp==4.2.8
593
593
```
594
594
595
595
Or you can install ` spark-nlp ` from inside Zeppelin by using Conda:
@@ -614,7 +614,7 @@ The easiest way to get this done on Linux and macOS is to simply install `spark-
614
614
$ conda create -n sparknlp python=3.8 -y
615
615
$ conda activate sparknlp
616
616
# spark-nlp by default is based on pyspark 3.x
617
- $ pip install spark-nlp==4.2.7 pyspark==3.2.3 jupyter
617
+ $ pip install spark-nlp==4.2.8 pyspark==3.2.3 jupyter
618
618
$ jupyter notebook
619
619
```
620
620
@@ -630,7 +630,7 @@ export PYSPARK_PYTHON=python3
630
630
export PYSPARK_DRIVER_PYTHON=jupyter
631
631
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
632
632
633
- pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
633
+ pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8
634
634
```
635
635
636
636
Alternatively, you can mix in using ` --jars ` option for pyspark + ` pip install spark-nlp `
@@ -655,7 +655,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
655
655
# -s is for spark-nlp
656
656
# -g will enable upgrading libcudnn8 to 8.1.0 on Google Colab for GPU usage
657
657
# by default they are set to the latest
658
- ! wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.2.7
658
+ ! wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.2.8
659
659
```
660
660
661
661
[ Spark NLP quick start on Google Colab] ( https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/quick_start_google_colab.ipynb ) is a live demo on Google Colab that performs named entity recognitions and sentiment analysis by using Spark NLP pretrained pipelines.
@@ -676,7 +676,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
676
676
# -s is for spark-nlp
677
677
# -g will enable upgrading libcudnn8 to 8.1.0 on Kaggle for GPU usage
678
678
# by default they are set to the latest
679
- ! wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.2.7
679
+ ! wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.2.8
680
680
```
681
681
682
682
[ Spark NLP quick start on Kaggle Kernel] ( https://www.kaggle.com/mozzie/spark-nlp-named-entity-recognition ) is a live demo on Kaggle Kernel that performs named entity recognitions by using Spark NLP pretrained pipeline.
@@ -694,9 +694,9 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
694
694
695
695
3. In ` Libraries` tab inside your cluster you need to follow these steps:
696
696
697
- 3.1. Install New -> PyPI -> ` spark-nlp==4.2.7 ` -> Install
697
+ 3.1. Install New -> PyPI -> ` spark-nlp==4.2.8 ` -> Install
698
698
699
- 3.2. Install New -> Maven -> Coordinates -> ` com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7 ` -> Install
699
+ 3.2. Install New -> Maven -> Coordinates -> ` com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8 ` -> Install
700
700
701
701
4. Now you can attach your notebook to the cluster and use Spark NLP!
702
702
@@ -744,7 +744,7 @@ A sample of your software configuration in JSON on S3 (must be public access):
744
744
"spark.kryoserializer.buffer.max": "2000M",
745
745
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
746
746
"spark.driver.maxResultSize": "0",
747
- "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7 "
747
+ "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8 "
748
748
}
749
749
}]
750
750
```
@@ -753,7 +753,7 @@ A sample of AWS CLI to launch EMR cluster:
753
753
754
754
```.sh
755
755
aws emr create-cluster \
756
- --name "Spark NLP 4.2.7 " \
756
+ --name "Spark NLP 4.2.8 " \
757
757
--release-label emr-6.2.0 \
758
758
--applications Name=Hadoop Name=Spark Name=Hive \
759
759
--instance-type m4.4xlarge \
@@ -817,7 +817,7 @@ gcloud dataproc clusters create ${CLUSTER_NAME} \
817
817
--enable-component-gateway \
818
818
--metadata ' PIP_PACKAGES=spark-nlp spark-nlp-display google-cloud-bigquery google-cloud-storage' \
819
819
--initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/python/pip-install.sh \
820
- --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
820
+ --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8
821
821
```
822
822
823
823
2. On an existing one, you need to install spark-nlp and spark-nlp-display packages from PyPI.
@@ -856,7 +856,7 @@ spark = SparkSession.builder \
856
856
.config("spark.kryoserializer.buffer.max", "2000m") \
857
857
.config("spark.jsl.settings.pretrained.cache_folder", "sample_data/pretrained") \
858
858
.config("spark.jsl.settings.storage.cluster_tmp_dir", "sample_data/storage") \
859
- .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7 ") \
859
+ .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8 ") \
860
860
.getOrCreate()
861
861
```
862
862
@@ -870,7 +870,7 @@ spark-shell \
870
870
--conf spark.kryoserializer.buffer.max=2000M \
871
871
--conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
872
872
--conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
873
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
873
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8
874
874
```
875
875
876
876
**pyspark:**
@@ -883,7 +883,7 @@ pyspark \
883
883
--conf spark.kryoserializer.buffer.max=2000M \
884
884
--conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
885
885
--conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
886
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.7
886
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.8
887
887
```
888
888
889
889
**Databricks:**
@@ -1147,12 +1147,12 @@ spark = SparkSession.builder \
1147
1147
.config("spark.driver.memory","16G")\
1148
1148
.config("spark.driver.maxResultSize", "0") \
1149
1149
.config("spark.kryoserializer.buffer.max", "2000M")\
1150
- .config("spark.jars", "/tmp/spark-nlp-assembly-4.2.7 .jar")\
1150
+ .config("spark.jars", "/tmp/spark-nlp-assembly-4.2.8 .jar")\
1151
1151
.getOrCreate()
1152
1152
```
1153
1153
1154
1154
- You can download provided Fat JARs from each [release notes](https://github.com/JohnSnowLabs/spark-nlp/releases), please pay attention to pick the one that suits your environment depending on the device (CPU/GPU) and Apache Spark version (3.0.x, 3.1.x, 3.2.x, and 3.3.x)
1155
- - If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. (i.e., `hdfs:///tmp/spark-nlp-assembly-4.2.7 .jar`)
1155
+ - If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. (i.e., `hdfs:///tmp/spark-nlp-assembly-4.2.8 .jar`)
1156
1156
1157
1157
Example of using pretrained Models and Pipelines in offline:
1158
1158
0 commit comments