Skip to content
7 changes: 6 additions & 1 deletion docs/getting_started/guided/guided.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
Guided Topic Modeling or Seeded Topic Modeling is a collection of techniques that guides the topic modeling approach by setting several seed topics to which the model will converge to. These techniques allow the user to set a predefined number of topic representations that are sure to be in documents. For example, take an IT business that has a ticket system for the software their clients use. Those tickets may typically contain information about a specific bug regarding login issues that the IT business is aware of.
!!! Note
Difference between Zero-shot and Guided BERTopic:
Guided BERTopic is similar - yet not equivalent - to [Zeros-shot Topic Modeling](https://maartengr.github.io/BERTopic/getting_started/zeroshot/zeroshot.html).
Use Guided BERTopic to boost the importance of certain keywords. Use [Zeros-shot Topic Modeling](https://maartengr.github.io/BERTopic/getting_started/zeroshot/zeroshot.html) to try to categorize documents into predefined topics ("zero-shot topics") before clustering the remaining unclassified documents using the main algorithm of BERTopic.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we have to further specify that Guided BERTopic is not primarily used to boost the importance of certain keywords but to guide clusters more towards predefined seed topics. That would make it a bit more clear that we do not focus on the importance of certain keywords (that's more of an side-effect).


Guided Topic Modeling or Seeded Topic Modeling is a collection of techniques that guides the topic modeling approach by setting several seed topics to which the model will converge. These techniques allow the user to set a predefined number of topic representations that are sure to be in documents. For example, take an IT business that has a ticket system for the software their clients use. Those tickets may typically contain information about a specific bug regarding login issues that the IT business is aware of.

To model that bug, we can create a seed topic representation containing the words `bug`, `login`, `password`,
and `username`. By defining those words, a Guided Topic Modeling approach will try to converge at least one topic to those words.
Expand Down
4 changes: 4 additions & 0 deletions docs/getting_started/zeroshot/zeroshot.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
!!! Note
Difference between Zero-shot and Guided BERTopic:
Zeros-shot Topic Modeling is similar - yet not equivalent - to [Guided BERTopic](https://maartengr.github.io/BERTopic/getting_started/guided/guided.html). Use [Guided BERTopic](https://maartengr.github.io/BERTopic/getting_started/guided/guided.html) to boost certain keyword's importance. Use [Zeros-shot Topic Modeling](https://maartengr.github.io/BERTopic/getting_started/zeroshot/zeroshot.html) to try to categorize documents into predefined topics ("zero-shot topics") before the clustering the remaining, unclassified documents, using the default unsupervised BERTopic topic exploration algorithm.

Zero-shot Topic Modeling is a technique that allows you to find topics in large amounts of documents that were predefined. When faced with many documents, you often have an idea of which topics will definitely be in there. Whether that is a result of simply knowing your data or if a domain expert is involved in defining those topics.

This method allows you to not only find those specific topics but also create new topics for documents that would not fit with your predefined topics.
Expand Down
Loading