You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
Features
Add Flair to allow for more (custom) token/document embeddings
Option to use custom UMAP, HDBSCAN, and CountVectorizer
Added low_memory parameter to reduce memory during computation
Improved verbosity (shows progress bar)
Improved testing
Use the newest version of sentence-transformers as it speeds ups encoding significantly
Return the figure of visualize_topics()
Expose all parameters with a single function: get_params()
Option to disable the saving of embedding_model, should reduce BERTopic size significantly
Add FAQ page
Fixes
To simplify the API, the parameters stop_words and n_neighbors were removed. These can still be used when a custom UMAP or CountVectorizer is used.
Set calculate_probabilities to False as a default. Calculating probabilities with HDBSCAN significantly increases computation time and memory usage. Better to remove calculating probabilities or only allow it by manually turning this on.