One question about the reproducibility #2428
Unanswered
powerhorse1986
asked this question in
Q&A
Replies: 1 comment
-
hdbscan is not deterministic, which is very furstrating.... if determinsm is what you need, try to replace with kmeans. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Maarten,
We are doing a new project using BERTopic, which is really an awesome tool!
But I noticed that the reproducibility of BERTopic might be a problem.
For this new project, we performed topic modeling multiple times using BERTopic on more than 3000 abstracts. For the first ten times, BERTopic generated 4 topics, including one outliers. But for the 11th time, 24 topics were generated. All the parameters of UMAP and HDBSCAN were the same. Then I adjusted the parameter "min_cluster_size" of HDBSCAN and got 4 topics again.
I totally have no idea why this happened. Would you mind giving some hints? Thank you :)
Best,
Li
Beta Was this translation helpful? Give feedback.
All reactions