Mapping and Comparing Topics from BERTopic Models Using Different Sentence Embeddings #2382

NokeYuan · 2025-06-18T17:06:25Z

NokeYuan
Jun 18, 2025

Hi Maarten,

I'm trying to find the "common ground" among BERTopic models trained using different sentence embeddings on the same input data.

For example, the input data could be a set of clinical notes. I used "MiniLM", "PubMedBERT", and "BioBERT". After training, they produced 15 topics, 10 topics, and 12 topics, respectively.

Is there any way to find the "common ground" (or the degree of "agreement") among the topics produced by the different models? For example, is it possible to say that they all agree on topic 1, but topic 2 only appears in MiniLM and not the others? I'm not sure how to find the corresponding topic from one embedding model and map it to another—or whether that's even possible.

What is the best practice to approach this?

Best Regards,

MaartenGr · 2025-07-04T12:48:40Z

MaartenGr
Jul 4, 2025
Maintainer

Hi, apologies for the delay!

It might be possible if you are using the c-TF-IDF matrices. Those will be similar (feature-wise) amongst the different embedding models and would allow you to easily compare topic models with different embeddings. Part of that would be to conduct similarity searches between topics using those topic c-TF-IDF.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mapping and Comparing Topics from BERTopic Models Using Different Sentence Embeddings #2382

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Mapping and Comparing Topics from BERTopic Models Using Different Sentence Embeddings #2382

Uh oh!

NokeYuan Jun 18, 2025

Replies: 1 comment

Uh oh!

MaartenGr Jul 4, 2025 Maintainer

NokeYuan
Jun 18, 2025

MaartenGr
Jul 4, 2025
Maintainer