You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. It provides **simple**, **performant** & **accurate** NLP annotations for machine learning pipelines that **scale** easily in a distributed
21
-
environment.
22
-
Spark NLP comes with **36000+** pretrained **pipelines** and **models** in more than **200+** languages.
20
+
Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. It provides **simple**, **performant** & **accurate** NLP annotations for machine learning pipelines that **scale** easily in a distributed environment.
21
+
22
+
Spark NLP comes with **83000+** pretrained **pipelines** and **models** in more than **200+** languages.
23
23
It also offers tasks such as **Tokenization**, **Word Segmentation**, **Part-of-Speech Tagging**, Word and Sentence **Embeddings**, **Named Entity Recognition**, **Dependency Parsing**, **Spell Checking**, **Text Classification**, **Sentiment Analysis**, **Token Classification**, **Machine Translation** (+180 languages), **Summarization**, **Question Answering**, **Table Question Answering**, **Text Generation**, **Image Classification**, **Image to Text (captioning)**, **Automatic Speech Recognition**, **Zero-Shot Learning**, and many more [NLP tasks](#features).
24
24
25
-
**Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Llama-2**, **M2M100**, **BART**, **Instructor**, **E5**, **Google T5**, **MarianMT**, **OpenAI GPT2**, **Vision Transformers (ViT)**, **OpenAI Whisper**, and many more not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively.
25
+
**Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Llama-2**, **M2M100**, **BART**, **Instructor**, **E5**, **Google T5**, **MarianMT**, **OpenAI GPT2**, **Vision Transformers (ViT)**, **OpenAI Whisper**, **Llama**, **Mistral**, **Phi**, **Qwen2**, and many more not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively.
26
+
27
+
## Model Importing Support
28
+
29
+
Spark NLP provides easy support for importing models from various popular frameworks:
30
+
31
+
-**TensorFlow**
32
+
-**ONNX**
33
+
-**OpenVINO**
34
+
-**Llama.cpp (GGUF)**
35
+
36
+
This wide range of support allows you to seamlessly integrate models from different sources into your Spark NLP workflows, enhancing flexibility and compatibility with existing machine learning ecosystems.
26
37
27
38
## Project's website
28
39
29
40
Take a look at our official Spark NLP page: [https://sparknlp.org/](https://sparknlp.org/) for user
-[Parsing and Analysis](https://sparknlp.org/docs/en/features#parsing-and-analysis)
35
47
-[Sentiment and Classification](https://sparknlp.org/docs/en/features#sentiment-and-classification)
@@ -51,7 +63,7 @@ $ java -version
51
63
$ conda create -n sparknlp python=3.7 -y
52
64
$ conda activate sparknlp
53
65
# spark-nlp by default is based on pyspark 3.x
54
-
$ pip install spark-nlp==5.4.0 pyspark==3.3.1
66
+
$ pip install spark-nlp==5.5.0 pyspark==3.3.1
55
67
```
56
68
57
69
In Python console or Jupyter `Python3` kernel:
@@ -108,6 +120,7 @@ community and we had to build most of the dependencies by ourselves to make them
108
120
architectures, however, they may not work in some environments.
109
121
110
122
## Pipelines and Models
123
+
111
124
For a quick example of using pipelines and models take a look at our official [documentation](https://sparknlp.org/docs/en/install#pipelines-and-models)
112
125
113
126
#### Please check out our Models Hub for the full list of [pre-trained models](https://sparknlp.org/models) with examples, demo, benchmark, and more
@@ -116,10 +129,11 @@ For a quick example of using pipelines and models take a look at our official [d
116
129
117
130
### Apache Spark Support
118
131
119
-
Spark NLP *5.4.0* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
132
+
Spark NLP *5.5.0* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
@@ -141,38 +157,45 @@ Find out more about 4.x `SparkNLP` versions in our official [documentation](http
141
157
142
158
### Databricks Support
143
159
144
-
Spark NLP 5.4.0 has been tested and is compatible with the following runtimes:
160
+
Spark NLP 5.5.0 has been tested and is compatible with the following runtimes:
145
161
146
162
|**CPU**|**GPU**|
147
163
|--------------------|--------------------|
148
-
| 14.0 / 14.0 ML | 14.0 ML & GPU |
149
164
| 14.1 / 14.1 ML | 14.1 ML & GPU |
150
165
| 14.2 / 14.2 ML | 14.2 ML & GPU |
151
166
| 14.3 / 14.3 ML | 14.3 ML & GPU |
167
+
| 15.0 / 15.0 ML | 15.0 ML & GPU |
168
+
| 15.1 / 15.0 ML | 15.1 ML & GPU |
169
+
| 15.2 / 15.0 ML | 15.2 ML & GPU |
170
+
| 15.3 / 15.0 ML | 15.3 ML & GPU |
171
+
| 15.4 / 15.0 ML | 15.4 ML & GPU |
152
172
153
173
We are compatible with older runtimes. For a full list check databricks support in our official [documentation](https://sparknlp.org/docs/en/install#databricks-support)
154
174
155
175
### EMR Support
156
176
157
-
Spark NLP 5.4.0 has been tested and is compatible with the following EMR releases:
177
+
Spark NLP 5.5.0 has been tested and is compatible with the following EMR releases:
158
178
159
179
|**EMR Release**|
160
180
|--------------------|
161
181
| emr-6.13.0 |
162
182
| emr-6.14.0 |
163
183
| emr-6.15.0 |
164
184
| emr-7.0.0 |
185
+
| emr-7.1.0 |
186
+
| emr-7.2.0 |
165
187
166
188
We are compatible with older EMR releases. For a full list check EMR support in our official [documentation](https://sparknlp.org/docs/en/install#emr-support)
167
189
168
190
Full list of [Amazon EMR 6.x releases](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-6x.html)
169
-
Full list 5.4.2mazon EMR 7.x releases](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-7x.html)
191
+
Full list of [Amazon EMR 7.x releases](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-7x.html)
170
192
171
193
NOTE: The EMR 6.1.0 and 6.1.1 are not supported.
172
194
173
195
## Installation
174
196
175
197
### Command line (requires internet connection)
198
+
176
199
To install spark-nlp packages through command line follow [these instructions](https://sparknlp.org/docs/en/install#command-line) from our official documentation
177
200
178
201
### Scala
@@ -182,18 +205,19 @@ deployed to Maven central. To add any of our packages as a dependency in your ap
182
205
from our official documentation.
183
206
184
207
If you are interested, there is a simple SBT project for Spark NLP to guide you on how to use it in your
Spark NLP library and all the pre-trained models/pipelines can be used entirely offline with no access to the Internet.
@@ -227,7 +250,7 @@ In Spark NLP we can define S3 locations to:
227
250
228
251
Please check [these instructions](https://sparknlp.org/docs/en/install#s3-integration) from our official documentation.
229
252
230
-
## Document5.4.2
253
+
## Document5.5.0
231
254
232
255
### Examples
233
256
@@ -260,7 +283,7 @@ the Spark NLP library:
260
283
keywords = {Spark, Natural language processing, Deep learning, Tensorflow, Cluster},
261
284
abstract = {Spark NLP is a Natural Language Processing (NLP) library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines that can scale easily in a distributed environment. Spark NLP comes with 1100+ pretrained pipelines and models in more than 192+ languages. It supports nearly all the NLP tasks and modules that can be used seamlessly in a cluster. Downloaded more than 2.7 million times and experiencing 9x growth since January 2020, Spark NLP is used by 54% of healthcare organizations as the world’s most widely used NLP library in the enterprise.}
0 commit comments