Skip to content

Spark NLP 4.2.2: Support DBFS, HDFS, and S3 for importing external models, unifying LightPipeline APIs across supported languages for Image Classification, new fullAnnotateImage for Scala, new fullAnnotateImageJava for Java, support LightPipeline for QuestionAnswering pre-trained pipelines, and bug fixes

Compare
Choose a tag to compare
@maziyarpanahi maziyarpanahi released this 27 Oct 18:07
· 904 commits to master since this release

πŸ“’ Overview

Spark NLP 4.2.2 πŸš€ comes with support for DBFS, HDFS, and S3 in addition to local file systems when you are importing external models from TF Hub and Hugging Face, unifying LightPipeline APIs across Scala, Java, and Python languages for Image Classification, the new fullAnnotateImage for Scala, the new fullAnnotateImageJava for Java, the support for LightPipeline for QuestionAnswering pre-trained pipelines, and bug fixes.

Do not forget to visit Models Hub with over 11400+ free and open-source models & pipelines. As always, we would like to thank our community for their feedback, questions, and feature requests. πŸŽ‰


⭐ New Features & improvements

  • Add support for importing TensorFlow SavedModel from remote storages like DBFS, S3, and HDFS. From this release, you can import models saved from TF Hub and HuggingFace on a remote storage
  • Add support for fullAnnotate in LightPipeline for the path of images in Scala
  • Add fullAnnotate method in PretrainedPipeline for Scala
  • Add fullAnnotateJava method in PretrainedPipeline for Java
  • Add fullAnnotateImage to PretrainedPipeline for Scala
  • Add fullAnnotateImageJava to PretrainedPipeline for Java
  • Add support for Question Answering in fullAnnotate method in PretrainedPipeline
  • Add Predicted Entities to all Vision Transformers (ViT) models and pipelines

Bug Fixes

  • Unify the annotatorType name in Python and Scala for Spark schema in Annotation, AnnotationImage, and AnnotationAudio
  • Fix missing indexes in the RecursiveTokenizer annotator affecting downstream NLP tasks in the pipeline

πŸ““ New Notebooks

Spark NLP Notebooks Colab
WordSegmenter Import External SavedModel From Remote Open In Colab

πŸ“– Documentation


Installation

Python

#PyPI

pip install spark-nlp==4.2.2

Spark Packages

spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x (Scala 2.12):

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.2

pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.2.2

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.2

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.2.2

M1

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.2

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.2

Maven

spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp_2.12</artifactId>
    <version>4.2.2</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu_2.12</artifactId>
    <version>4.2.2</version>
</dependency>

spark-nlp-m1:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-m1_2.12</artifactId>
    <version>4.2.2</version>
</dependency>

FAT JARs

What's Changed

Contributors

@galiph @agsfer @pabla @josejuanmartinez @Cabir40 @maziyarpanahi @Meryem1425 @danilojsl @jsl-builder @jsl-models @ahmedlone127 @DevinTDHa @jdobes-cz @Damla-Gurbaz @Mary-Sci

New Contributors

Full Changelog: 4.2.1...4.2.2