Replies: 1 comment
-
Hi, apologies for the delay! It might be possible if you are using the c-TF-IDF matrices. Those will be similar (feature-wise) amongst the different embedding models and would allow you to easily compare topic models with different embeddings. Part of that would be to conduct similarity searches between topics using those topic c-TF-IDF. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Maarten,
I'm trying to find the "common ground" among BERTopic models trained using different sentence embeddings on the same input data.
For example, the input data could be a set of clinical notes. I used "MiniLM", "PubMedBERT", and "BioBERT". After training, they produced 15 topics, 10 topics, and 12 topics, respectively.
Is there any way to find the "common ground" (or the degree of "agreement") among the topics produced by the different models? For example, is it possible to say that they all agree on topic 1, but topic 2 only appears in MiniLM and not the others? I'm not sure how to find the corresponding topic from one embedding model and map it to another—or whether that's even possible.
What is the best practice to approach this?
Best Regards,
Beta Was this translation helpful? Give feedback.
All reactions