Skip to content

6.0.5

Latest
Compare
Choose a tag to compare
@DevinTDHa DevinTDHa released this 10 Jul 07:58
6.0.5

📢 Spark NLP 6.0.5: Enhanced Microsoft Fabric Integration & Markdown Processing

We're thrilled to announce the release of Spark NLP 6.0.5! This version introduces a new Markdown Reader, enabling direct processing of Markdown files into structured Spark DataFrames for more diverse NLP workflows. We have also enhanced Microsoft Fabric integration, allowing for seamless model downloads from Lakehouse containers.

🔥 Highlights

  • New Markdown Reader: Introduce the new MarkdownReader for effortlessly parsing Markdown files into structured Spark DataFrames, paving the way for advanced content analysis and NLP on Markdown content.
  • Enhanced Microsoft Fabric Support: Download models directly from Microsoft Fabric Lakehouse containers, streamlining your NLP deployments in the Fabric environment.

🚀 New Features & Enhancements

  • New MarkdownReader Annotator: Introducing the MarkdownReader, a powerful new feature that allows you to read and parse Markdown files directly into a structured Spark DataFrame. This enables efficient processing and analysis of Markdown content for various NLP applications. We recommend using this reader automatically in our Partition annotator. (Link to notebook)

    partitioner = Partition(content_type = "text/markdown"").partition(md_directory)
  • Microsoft Fabric Integration: Spark NLP now supports downloading models from Microsoft Fabric Lakehouse containers, providing a more integrated and efficient workflow for users leveraging Microsoft Fabric. This enhancement ensures smoother model access and deployment within the Fabric ecosystem. For example, you can define the path to our pretrained models in Spark like so:

    from pyspark import SparkConf
    
    conf = SparkConf()
    conf.set("spark.jsl.settings.pretrained.cache_folder", "abfss://[email protected]/lakehouse_folder.Lakehouse/Files/my_models")

🐛 Bug Fixes

We performed crucial maintenance updates to all of our example notebooks, ensuring that they are reproducible and properly displayed in GitHub.

❤️ Community Support

  • Slack For live discussion with the Spark NLP community and the team
  • GitHub Bug reports, feature requests, and contributions
  • Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
  • Medium Spark NLP articles
  • JohnSnowLabs official Medium
  • YouTube Spark NLP video tutorials

⚙️ Installation

Python

#PyPI
pip install spark-nlp==6.0.5

Spark Packages

spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x (Scala 2.12):

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:6.0.5

pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:6.0.5

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:6.0.5

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:6.0.5

Apple Silicon

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:6.0.5

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:6.0.5

AArch64

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:6.0.5

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:6.0.5

Maven

spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp_2.12</artifactId>
    <version>6.0.5</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu_2.12</artifactId>
    <version>6.0.5</version>
</dependency>

spark-nlp-silicon:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-silicon_2.12</artifactId>
    <version>6.0.5</version>
</dependency>

spark-nlp-aarch64:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-aarch64_2.12</artifactId>
    <version>6.0.5</version>
</dependency>

FAT JARs

What's Changed

Full Changelog: 6.0.4...6.0.5