📢 Spark NLP 6.0.5: Enhanced Microsoft Fabric Integration & Markdown Processing
We're thrilled to announce the release of Spark NLP 6.0.5! This version introduces a new Markdown Reader, enabling direct processing of Markdown files into structured Spark DataFrames for more diverse NLP workflows. We have also enhanced Microsoft Fabric integration, allowing for seamless model downloads from Lakehouse containers.
🔥 Highlights
- New Markdown Reader: Introduce the new
MarkdownReader
for effortlessly parsing Markdown files into structured Spark DataFrames, paving the way for advanced content analysis and NLP on Markdown content. - Enhanced Microsoft Fabric Support: Download models directly from Microsoft Fabric Lakehouse containers, streamlining your NLP deployments in the Fabric environment.
🚀 New Features & Enhancements
-
New MarkdownReader Annotator: Introducing the
MarkdownReader
, a powerful new feature that allows you to read and parse Markdown files directly into a structured Spark DataFrame. This enables efficient processing and analysis of Markdown content for various NLP applications. We recommend using this reader automatically in ourPartition
annotator. (Link to notebook)partitioner = Partition(content_type = "text/markdown"").partition(md_directory)
-
Microsoft Fabric Integration: Spark NLP now supports downloading models from Microsoft Fabric Lakehouse containers, providing a more integrated and efficient workflow for users leveraging Microsoft Fabric. This enhancement ensures smoother model access and deployment within the Fabric ecosystem. For example, you can define the path to our pretrained models in Spark like so:
from pyspark import SparkConf conf = SparkConf() conf.set("spark.jsl.settings.pretrained.cache_folder", "abfss://[email protected]/lakehouse_folder.Lakehouse/Files/my_models")
🐛 Bug Fixes
We performed crucial maintenance updates to all of our example notebooks, ensuring that they are reproducible and properly displayed in GitHub.
❤️ Community Support
- Slack For live discussion with the Spark NLP community and the team
- GitHub Bug reports, feature requests, and contributions
- Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
- Medium Spark NLP articles
- JohnSnowLabs official Medium
- YouTube Spark NLP video tutorials
⚙️ Installation
Python
#PyPI
pip install spark-nlp==6.0.5
Spark Packages
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x (Scala 2.12):
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:6.0.5
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:6.0.5
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:6.0.5
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:6.0.5
Apple Silicon
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:6.0.5
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:6.0.5
AArch64
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:6.0.5
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:6.0.5
Maven
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.12</artifactId>
<version>6.0.5</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu_2.12</artifactId>
<version>6.0.5</version>
</dependency>
spark-nlp-silicon:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-silicon_2.12</artifactId>
<version>6.0.5</version>
</dependency>
spark-nlp-aarch64:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-aarch64_2.12</artifactId>
<version>6.0.5</version>
</dependency>
FAT JARs
- CPU on Apache Spark 3.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-6.0.5.jar
- GPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-6.0.5.jar
- M1 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-silicon-assembly-6.0.5.jar
- AArch64 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-aarch64-assembly-6.0.5.jar
What's Changed
Full Changelog: 6.0.4...6.0.5