Skip to content

Major Release v0.7

Compare
Choose a tag to compare
@MaartenGr MaartenGr released this 26 Apr 11:26
· 203 commits to master since this release
4369b4e

The two main features are (semi-)supervised topic modeling
and several backends to use instead of Flair and SentenceTransformers!

Highlights:

  • (semi-)supervised topic modeling by leveraging supervised options in UMAP
    • model.fit(docs, y=target_classes)
  • Backends:
    • Added Spacy, Gensim, USE (TFHub)
    • Use a different backend for document embeddings and word embeddings
    • Create your own backends with bertopic.backend.BaseEmbedder
    • Click here for an overview of all new backends
  • Calculate and visualize topics per class
    • Calculate: topics_per_class = topic_model.topics_per_class(docs, topics, classes)
    • Visualize: topic_model.visualize_topics_per_class(topics_per_class)
  • Several tutorials were updated and added:
Name Link
Topic Modeling with BERTopic Open In Colab
(Custom) Embedding Models in BERTopic Open In Colab
Advanced Customization in BERTopic Open In Colab
(semi-)Supervised Topic Modeling with BERTopic Open In Colab
Dynamic Topic Modeling with Trump's Tweets Open In Colab

Fixes:

  • Fixed issues with Torch req
  • Prevent saving term frequency matrix in CTFIDF class
  • Fixed DTM not working when reducing topics (#96)
  • Moved visualization dependencies to base BERTopic
    • pip install bertopic[visualization] becomes pip install bertopic
  • Allow precomputed embeddings in bertopic.find_topics() (#79):
model = BERTopic(embedding_model=my_embedding_model)
model.fit(docs, my_precomputed_embeddings)
model.find_topics(search_term)