You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Integrated light pipeline which yields 10x speed up for predictions
18
-
- Easy and Copy pastable moel configs via pipe.print_info()
19
-
- N new Notebooks
20
-
- Recycling of Pandas indexes for predicting. No more ID columns, just pandas indexes.
21
-
- Up to 10x Speed improvement with light pipeline leverage from Spark NLP
22
-
23
-
#### Lots of new Demos
24
-
- Named Entity Recognition (NER)
25
-
-[NER pretrained on ONTO Notes](https://colab.research.google.com/drive/1_sgbJV3dYPZ_Q7acCgKWgqZkWcKAfg79?usp=sharing)
26
-
-[NER pretrained on CONLL](https://colab.research.google.com/drive/1CYzHfQyFCdvIOVO2Z5aggVI9c0hDEOrw?usp=sharing)
27
-
- Part of speech (POS)
9
+
# NLU release notes 0.1
10
+
11
+
## NLU: The Power of Spark NLP, the Simplicity of Python
12
+
John Snow Labs' NLU is a Python library for applying state-of-the-art text mining, directly on any dataframe, with a single line of code.
13
+
As a facade of the award-winning Spark NLP library, it comes with hundreds of pretrained models in tens of languages - all production-grade, scalable, and trainable.
14
+
15
+
## What kind of models does NLU provide?
16
+
NLU provides everything a data scientist might want to wish for in one line of code!
17
+
- The 100+ of the latest NLP word embeddings ( BERT, ELMO, ALBERT, XLNET, GLOVE, BIOBERT, ELECTRA, COVIDBERT) and different variations of them
18
+
- The 10+ of the latest NLP sentence embeddings ( BERT, ELECTRA, USE) and different variations of them
19
+
- Generation of Sentence, Chunk and Document from these embeddings
20
+
- Language Classification of 20 languages
21
+
- 36 pretrained NER models
22
+
- 34 Part of Speech (POS) models
23
+
- 34 Lemmatizer models
24
+
- Emotion models for 5 categories
25
+
- Labeled and Unlabeled Dependency parsing
26
+
- Spell Checking
27
+
- Stopword removers for 41 languages
28
+
- Classifiers for 12 different problems
29
+
-**244 unique** NLU components
30
+
-**176 unique** NLP models and algorithms
31
+
-**68 unique** NLP pipelines consisting of composed NLP models
32
+
33
+
34
+
35
+
## Classifiers trained on many different different datasets
36
+
Choose the right tool for the right task! Whether you analyze movies or twitter, NLU has the right model for you!
37
+
38
+
- 50 Class Questions Classifier
39
+
- Spam Classifier
40
+
- Fake News Classifier
41
+
- Emotion Classifier
42
+
- Cyberbullying Classifier
43
+
- Sarcasm Classifier
44
+
- Toxic Classifer
45
+
- E2E Classifier
46
+
- Sentiment classifier pretrained on IMDB movie reviews
47
+
- Sentiment classifier pretrained on twitter
48
+
- NER pretrained on ONTO notes
49
+
- NER trainer on CONLL
50
+
- Language classifier for 20 languages on the wiki 20 lang dataset.
51
+
52
+
## Data Pre-Processing and Text Cleaning
53
+
Working with text data can sometimes be quite a dirty Job. NLU helps you keep your hands clean by providing lots of components that take away data engineering intensive tasks.
54
+
55
+
- Datetime Matcher
56
+
- Pattern Matcher
57
+
- Chunk Matcher
58
+
- Phrases Matcher
59
+
- Stopword Cleaners
60
+
- Pattern Cleaners
61
+
- Slang Cleaner
62
+
63
+
## Where can I see NLUs entire offer?
64
+
Checkout the [NLU Namespace](https://nlu.johnsnowlabs.com/docs/en/namespace) for everything that NLU has to offer!
65
+
66
+
67
+
68
+
## Supported Data Types
69
+
- Pandas DataFrame and Series
70
+
- Spark DataFrames
71
+
- Modin with Ray backend
72
+
- Modin with Dask backend
73
+
- Numpy arrays
74
+
- Strings and lists of strings
75
+
76
+
77
+
Checkout the following notebooks for examples on how to work with NLU.
- Additional feature discovery via nlu.components()
77
-
- Memory optimization
78
-
- Refactoring
79
-
- Docs and Examples updates
80
-
81
-
### 2.5.5
82
-
- Confidence extraction bugfix
83
-
84
-
### 2.5.4
85
-
- Fixed bug with bad conversion of datatypes
86
-
87
-
88
-
### 2.5.3
89
-
- metadata parameter for predict function, prettier outputs
90
-
- Datatype consistency added for predictions
91
-
92
-
### 2.5.2
93
-
- Modin dependency bugfix
94
-
95
-
### 2.5.1
96
-
- Modin Support
97
-
98
-
### 2.5.0
99
-
100
-
- Support for Modin with Ray and Dask Backends
101
-
- Consistent input and outputs for predict() . If you input Spark Dataframe , you get Spark Dataframe Back. If you input Modin dataframe, you get Modin back. Analogous for predictions on Numpy and Pandas objects
0 commit comments