Spark NLP 4.2.2: Support DBFS, HDFS, and S3 for importing external models, unifying LightPipeline APIs across supported languages for Image Classification, new fullAnnotateImage for Scala, new fullAnnotateImageJava for Java, support LightPipeline for QuestionAnswering pre-trained pipelines, and bug fixes
π’ Overview
Spark NLP 4.2.2 π comes with support for DBFS, HDFS, and S3 in addition to local file systems when you are importing external models from TF Hub and Hugging Face, unifying LightPipeline APIs across Scala, Java, and Python languages for Image Classification, the new fullAnnotateImage for Scala, the new fullAnnotateImageJava for Java, the support for LightPipeline for QuestionAnswering pre-trained pipelines, and bug fixes.
Do not forget to visit Models Hub with over 11400+ free and open-source models & pipelines. As always, we would like to thank our community for their feedback, questions, and feature requests. π
β New Features & improvements
- Add support for importing TensorFlow SavedModel from remote storages like DBFS, S3, and HDFS. From this release, you can import models saved from TF Hub and HuggingFace on a remote storage
- Add support for
fullAnnotate
inLightPipeline
for the path of images in Scala - Add
fullAnnotate
method inPretrainedPipeline
for Scala - Add
fullAnnotateJava
method inPretrainedPipeline
for Java - Add
fullAnnotateImage
toPretrainedPipeline
for Scala - Add
fullAnnotateImageJava
toPretrainedPipeline
for Java - Add support for Question Answering in
fullAnnotate
method inPretrainedPipeline
- Add
Predicted Entities
to all Vision Transformers (ViT) models and pipelines
Bug Fixes
- Unify the
annotatorType
name in Python and Scala for Spark schema in Annotation, AnnotationImage, and AnnotationAudio - Fix missing indexes in the
RecursiveTokenizer
annotator affecting downstream NLP tasks in the pipeline
π New Notebooks
Spark NLP | Notebooks | Colab |
---|---|---|
WordSegmenter | Import External SavedModel From Remote |
- You can visit Import Transformers in Spark NLP
- You can visit Spark NLP Workshop for 100+ examples
π Documentation
- TF Hub & HuggingFace to Spark NLP
- Models Hub with new models
- Spark NLP documentation
- Spark NLP Scala APIs
- Spark NLP Python APIs
- Spark NLP Workshop notebooks
- Spark NLP publications
- Spark NLP in Action
- Spark NLP training certification notebooks for Google Colab and Databricks
- Spark NLP Display for visualization of different types of annotations
- Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
Installation
Python
#PyPI
pip install spark-nlp==4.2.2
Spark Packages
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x (Scala 2.12):
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.2
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.2
M1
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.2
Maven
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.12</artifactId>
<version>4.2.2</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu_2.12</artifactId>
<version>4.2.2</version>
</dependency>
spark-nlp-m1:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-m1_2.12</artifactId>
<version>4.2.2</version>
</dependency>
FAT JARs
-
CPU on Apache Spark 3.x/3.1.x/3.2.x/3.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-4.2.2.jar
-
GPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-4.2.2.jar
-
M1 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-m1-assembly-4.2.2.jar
-
AArch64 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-m1-assembly-4.2.2.jar
What's Changed
Contributors
@galiph @agsfer @pabla @josejuanmartinez @Cabir40 @maziyarpanahi @Meryem1425 @danilojsl @jsl-builder @jsl-models @ahmedlone127 @DevinTDHa @jdobes-cz @Damla-Gurbaz @Mary-Sci
New Contributors
Full Changelog: 4.2.1...4.2.2