Skip to content

Commit dd93eda

Browse files
authored
NLU Release 0.0.1
NLU Release 0.0.1
1 parent 0ed87f4 commit dd93eda

1 file changed

Lines changed: 105 additions & 73 deletions

File tree

docs/en/release_notes.md

Lines changed: 105 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -6,27 +6,94 @@ key: docs-release-notes
66
modify_date: "2020-06-12"
77
---
88

9-
NLU release notes
10-
11-
### 2.6
12-
- Added 100+ new models from Spark NLP 2.6
13-
- New YAKE model
14-
- New Multi Class Classifier model
15-
- Improved outputs for Chunk level components
16-
- Integrated removal of IOB prefixes of NER tags
17-
- Integrated light pipeline which yields 10x speed up for predictions
18-
- Easy and Copy pastable moel configs via pipe.print_info()
19-
- N new Notebooks
20-
- Recycling of Pandas indexes for predicting. No more ID columns, just pandas indexes.
21-
- Up to 10x Speed improvement with light pipeline leverage from Spark NLP
22-
23-
#### Lots of new Demos
24-
- Named Entity Recognition (NER)
25-
-[NER pretrained on ONTO Notes](https://colab.research.google.com/drive/1_sgbJV3dYPZ_Q7acCgKWgqZkWcKAfg79?usp=sharing)
26-
-[NER pretrained on CONLL](https://colab.research.google.com/drive/1CYzHfQyFCdvIOVO2Z5aggVI9c0hDEOrw?usp=sharing)
27-
- Part of speech (POS)
9+
# NLU release notes 0.1
10+
11+
## NLU: The Power of Spark NLP, the Simplicity of Python
12+
John Snow Labs' NLU is a Python library for applying state-of-the-art text mining, directly on any dataframe, with a single line of code.
13+
As a facade of the award-winning Spark NLP library, it comes with hundreds of pretrained models in tens of languages - all production-grade, scalable, and trainable.
14+
15+
## What kind of models does NLU provide?
16+
NLU provides everything a data scientist might want to wish for in one line of code!
17+
- The 100+ of the latest NLP word embeddings ( BERT, ELMO, ALBERT, XLNET, GLOVE, BIOBERT, ELECTRA, COVIDBERT) and different variations of them
18+
- The 10+ of the latest NLP sentence embeddings ( BERT, ELECTRA, USE) and different variations of them
19+
- Generation of Sentence, Chunk and Document from these embeddings
20+
- Language Classification of 20 languages
21+
- 36 pretrained NER models
22+
- 34 Part of Speech (POS) models
23+
- 34 Lemmatizer models
24+
- Emotion models for 5 categories
25+
- Labeled and Unlabeled Dependency parsing
26+
- Spell Checking
27+
- Stopword removers for 41 languages
28+
- Classifiers for 12 different problems
29+
- **244 unique** NLU components
30+
- **176 unique** NLP models and algorithms
31+
- **68 unique** NLP pipelines consisting of composed NLP models
32+
33+
34+
35+
## Classifiers trained on many different different datasets
36+
Choose the right tool for the right task! Whether you analyze movies or twitter, NLU has the right model for you!
37+
38+
- 50 Class Questions Classifier
39+
- Spam Classifier
40+
- Fake News Classifier
41+
- Emotion Classifier
42+
- Cyberbullying Classifier
43+
- Sarcasm Classifier
44+
- Toxic Classifer
45+
- E2E Classifier
46+
- Sentiment classifier pretrained on IMDB movie reviews
47+
- Sentiment classifier pretrained on twitter
48+
- NER pretrained on ONTO notes
49+
- NER trainer on CONLL
50+
- Language classifier for 20 languages on the wiki 20 lang dataset.
51+
52+
## Data Pre-Processing and Text Cleaning
53+
Working with text data can sometimes be quite a dirty Job. NLU helps you keep your hands clean by providing lots of components that take away data engineering intensive tasks.
54+
55+
- Datetime Matcher
56+
- Pattern Matcher
57+
- Chunk Matcher
58+
- Phrases Matcher
59+
- Stopword Cleaners
60+
- Pattern Cleaners
61+
- Slang Cleaner
62+
63+
## Where can I see NLUs entire offer?
64+
Checkout the [NLU Namespace](https://nlu.johnsnowlabs.com/docs/en/namespace) for everything that NLU has to offer!
65+
66+
67+
68+
## Supported Data Types
69+
- Pandas DataFrame and Series
70+
- Spark DataFrames
71+
- Modin with Ray backend
72+
- Modin with Dask backend
73+
- Numpy arrays
74+
- Strings and lists of strings
75+
76+
77+
Checkout the following notebooks for examples on how to work with NLU.
78+
79+
80+
# NLU Demos on Datasets
81+
- [Kaggle Twitter Airline Sentiment Analysis NLU demo](https://www.kaggle.com/kasimchristianloan/nlu-sentiment-airline-demo)
82+
- [Kaggle Twitter Airline Emotion Analysis NLU demo](https://www.kaggle.com/kasimchristianloan/nlu-emotion-airline-demo)
83+
- [Kaggle Twitter COVID Sentiment Analysis NLU demo](https://www.kaggle.com/kasimchristianloan/nlu-covid-sentiment-showcase)
84+
- [Kaggle Twitter COVID Emotion Analysis nlu demo](https://www.kaggle.com/kasimchristianloan/nlu-covid-emotion-showcase)
85+
86+
87+
# NLU component examples
88+
89+
The following are Collab examples which showcase each NLU component and some applications.
90+
91+
- ### Named Entity Recognition (NER)
92+
- [NER pretrained on ONTO Notes](https://colab.research.google.com/drive/1_sgbJV3dYPZ_Q7acCgKWgqZkWcKAfg79?usp=sharing)
93+
- [NER pretrained on CONLL](https://colab.research.google.com/drive/1CYzHfQyFCdvIOVO2Z5aggVI9c0hDEOrw?usp=sharing)
94+
- ### Part of speech (POS)
2895
- [POS pretrained on ANC dataset](https://colab.research.google.com/drive/1tW833T3HS8F5Lvn6LgeDd5LW5226syKN?usp=sharing)
29-
- Classifiers
96+
- ### Classifiers
3097
- [Unsupervised Keyword Extraction with YAKE](https://colab.research.google.com/drive/1BdomIc1nhrGxLFOpK5r82Zc4eFgnIgaO?usp=sharing)
3198
- [Toxic Text Classifier](https://colab.research.google.com/drive/1QRG5ZtAvoJAMZ8ytFMfXj_W8ogdeRi9m?usp=sharing)
3299
- [Twitter Sentiment Classifier](https://colab.research.google.com/drive/1H1Gekn2qzXzOf5rrT8LmHmmuoOGsiu8m?usp=sharing)
@@ -38,23 +105,25 @@ NLU release notes
38105
- [E2E Classifier](https://colab.research.google.com/drive/1OSkiXGEpKlm9HWDoVb42uLNQQgb7nqNZ?usp=sharing)
39106
- [Cyberbullying Classifier](https://colab.research.google.com/drive/1OSkiXGEpKlm9HWDoVb42uLNQQgb7nqNZ?usp=sharing)
40107
- [Spam Classifier](https://colab.research.google.com/drive/1u-8Fs3Etz07bFNx0CDV_le3Xz73VbK0z?usp=sharing)
41-
- Word and Sentence Embeddings
108+
- ### Word Embeddings
42109
- [BERT Word Embeddings and T-SNE plotting](https://colab.research.google.com/drive/1Rg1vdSeq6sURc48RV8lpS47ja0bYwQmt?usp=sharing)
43-
- [BERT Sentence Embeddings and T-SNE plotting](https://colab.research.google.com/drive/1FmREx0O4BDeogldyN74_7Lur5NeiOVye?usp=sharing)
44110
- [ALBERT Word Embeddings and T-SNE plotting](https://colab.research.google.com/drive/18yd9pDoPkde79boTbAC8Xd03ROKisPsn?usp=sharing)
45111
- [ELMO Word Embeddings and T-SNE plotting](https://colab.research.google.com/drive/1TtNYB9z0yH8d1ZjfxkH0TVxQ2O_iOYVV?usp=sharing)
46112
- [XLNET Word Embeddings and T-SNE plotting](https://colab.research.google.com/drive/1C9T29QA00yjLuJ1yEMTbjUQMpUv35pHb?usp=sharing)
47113
- [ELECTRA Word Embeddings and T-SNE plotting](https://colab.research.google.com/drive/1FueGEaOj2JkbqHzdmxwKrNMHzgVt4baE?usp=sharing)
48114
- [COVIDBERT Word Embeddings and T-SNE plotting](https://colab.research.google.com/drive/1Yzc-GuNQyeWewJh5USTN7PbbcJvd-D7s?usp=sharing)
49115
- [BIOBERT Word Embeddings and T-SNE plotting](https://colab.research.google.com/drive/1llANd-XGD8vkGNMcqTi_8Dr_Ys6cr83W?usp=sharing)
50116
- [GLOVE Word Embeddings and T-SNE plotting](https://colab.research.google.com/drive/1IQxf4pJ_EnrIDyd0fAX-dv6u0YQWae2g?usp=sharing)
117+
- ### Sentence Embeddings
118+
- [BERT Sentence Embeddings and T-SNE plotting](https://colab.research.google.com/drive/1FmREx0O4BDeogldyN74_7Lur5NeiOVye?usp=sharing)
119+
- [ELECTRA Sentence Embeddings and T-SNE plotting](https://colab.research.google.com/drive/1VXHH0ltHF_hXdiRqRlrV_lymAO4ws5PO?usp=sharing)
51120
- [USE Sentence Embeddings and T-SNE plotting](https://colab.research.google.com/drive/1gZzOMiCovmrp7z8FIidzDTLS0nt8kPJT?usp=sharing)
52121

53-
- Depenency Parsing
54-
-[Untyped Dependency Parsing](https://colab.research.google.com/drive/1PC8ga_NFlOcTNeDVJY4x8Pl5oe0jVmue?usp=sharing)
55-
-[Typed Dependency Parsing](https://colab.research.google.com/drive/1KXUqcF8e-LU9cXnHE8ni8z758LuFPvY7?usp=sharing)
122+
- ### Dependency Parsing
123+
- [Untyped Dependency Parsing](https://colab.research.google.com/drive/1PC8ga_NFlOcTNeDVJY4x8Pl5oe0jVmue?usp=sharing)
124+
- [Typed Dependency Parsing](https://colab.research.google.com/drive/1KXUqcF8e-LU9cXnHE8ni8z758LuFPvY7?usp=sharing)
56125

57-
- Text Pre Processing and Cleaning
126+
- ### Text Pre Processing and Cleaning
58127
- [Tokenization](https://colab.research.google.com/drive/13BC6k6gLj1w5RZ0SyHjKsT2EOwJwbYwb?usp=sharing)
59128
- [Stopwords removal](https://colab.research.google.com/drive/1nWob4u93t2EJYupcOIanuPBDfShtYjGT?usp=sharing)
60129
- [Stemming](https://colab.research.google.com/drive/1gKTJJmffR9wz13Ms3pDy64jhUI8ZHZYu?usp=sharing)
@@ -63,53 +132,16 @@ NLU release notes
63132
- [Spellchecking](https://colab.research.google.com/drive/1bnRR8FygiiN3zJz3mRdbjPBUvFsx6IVB?usp=sharing)
64133
- [Sentence Detecting](https://colab.research.google.com/drive/1CAXEdRk_q3U5qbMXsxoVyZRwvonKthhF?usp=sharing)
65134

66-
- Chunkers
67-
-[N Gram](https://colab.research.google.com/drive/1pgqoRJ6yGWbTLWdLnRvwG5DLSU3rxuMq?usp=sharing)
68-
-[Entity Chunking](https://colab.research.google.com/drive/1svpqtC3cY6JnRGeJngIPl2raqxdowpyi?usp=sharing)
69-
- Matchers
70-
-[Date Matcher](https://colab.research.google.com/drive/1JrlfuV2jNGTdOXvaWIoHTSf6BscDMkN7?usp=sharing)
71-
72-
73-
### 2.5.6
74-
- Better Defaults for spell checking
75-
- Lots of bug fixes
76-
- Additional feature discovery via nlu.components()
77-
- Memory optimization
78-
- Refactoring
79-
- Docs and Examples updates
80-
81-
### 2.5.5
82-
- Confidence extraction bugfix
83-
84-
### 2.5.4
85-
- Fixed bug with bad conversion of datatypes
86-
87-
88-
### 2.5.3
89-
- metadata parameter for predict function, prettier outputs
90-
- Datatype consistency added for predictions
91-
92-
### 2.5.2
93-
- Modin dependency bugfix
94-
95-
### 2.5.1
96-
- Modin Support
97-
98-
### 2.5.0
99-
100-
- Support for Modin with Ray and Dask Backends
101-
- Consistent input and outputs for predict() . If you input Spark Dataframe , you get Spark Dataframe Back. If you input Modin dataframe, you get Modin back. Analogous for predictions on Numpy and Pandas objects
102-
103-
104-
105-
### 2.5.0.rc1
106-
107-
The birth of a new Machine Learning library
108-
NLU provides out of the box
135+
- ### Chunkers
136+
- [N Gram](https://colab.research.google.com/drive/1pgqoRJ6yGWbTLWdLnRvwG5DLSU3rxuMq?usp=sharing)
137+
- [Entity Chunking](https://colab.research.google.com/drive/1svpqtC3cY6JnRGeJngIPl2raqxdowpyi?usp=sharing)
138+
- ### Matchers
139+
- [Date Matcher](https://colab.research.google.com/drive/1JrlfuV2jNGTdOXvaWIoHTSf6BscDMkN7?usp=sharing)
109140

110-
- 200+ pretrained models and pipelines for most NLU tasks ( Sentiment, Language Detection, NER, POS, Spell Checking)
111-
- 60 languages
112-
- Latest and greatest embeddings in different flavors (Elmo, Bert, Albert, Xlnert, Glove, Use)
113-
- 13 Different types of NLU components
114141

142+
# Need help?
143+
- [Ping us on Slack](https://spark-nlp.slack.com/archives/C0196BQCDPY)
144+
- [Post an issue on Github](https://github.com/JohnSnowLabs/nlu/issues)
115145

146+
# Simple NLU Demos
147+
- [NLU different output levels Demo](https://colab.research.google.com/drive/1C4N3wpC17YzZf9fXHDNAJ5JvSmfbq7zT?usp=sharing)

0 commit comments

Comments
 (0)