Skip to content

Major Release v0.5

Compare
Choose a tag to compare
@MaartenGr MaartenGr released this 08 Feb 13:35
· 206 commits to master since this release
e84d7d1

Features

  • Add Flair to allow for more (custom) token/document embeddings
  • Option to use custom UMAP, HDBSCAN, and CountVectorizer
  • Added low_memory parameter to reduce memory during computation
  • Improved verbosity (shows progress bar)
  • Improved testing
  • Use the newest version of sentence-transformers as it speeds ups encoding significantly
  • Return the figure of visualize_topics()
  • Expose all parameters with a single function: get_params()
  • Option to disable the saving of embedding_model, should reduce BERTopic size significantly
  • Add FAQ page

Fixes

  • To simplify the API, the parameters stop_words and n_neighbors were removed. These can still be used when a custom UMAP or CountVectorizer is used.
  • Set calculate_probabilities to False as a default. Calculating probabilities with HDBSCAN significantly increases computation time and memory usage. Better to remove calculating probabilities or only allow it by manually turning this on.