Replies: 2 comments 1 reply
-
Found out there is
results:
But we still have -1 cluster. So i decided to reproduce the problem using pure HDBSCAN and got perfetct results (see below). So my questions is still relevent:
Result:
Also i noticed tha you can set
|
Beta Was this translation helpful? Give feedback.
-
Thank you for the link to dummy dim_model. I've tried it and got this error. Finally i switched to PCA and found out it works well with n_components less then 60 and returns the same error for n_components >=61 (this threshold depends on number of examples in dataset). I'll try to go forward with PCA, thank you. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have two problems, not sure is they are bugs or my lame skills.
I have done a search for similar discussions, find them, learned a lot but not enough to solve the problem.
there was as much as dozen topics for the same string.
The code:
Check params as topic_model.get_params() output:
Several consecutive results below. Please notice how many rows marked as -1. Also notice that last run has very good results so it is not hyperparameter issue:
I have also try to train bertopic on dataset with diffferent phrases and then apply it to dataset of the same single phrase. Same result.
My questions:
a) Why clustring algorithm work so weak in case of the same phrases? Is it clustring algorithm issue or may be other parts of bertopic?
b) How can i choose hyperparameters if i can't reproduce results?
P.s.
a)
bertopic.__version__ # '0.16.4'
b) distance_function = lambda x: 1 - np.clip(cosine_similarity(x), -1, 1) was added to fix known issue for some particular values of n_components. It does not change experiment results.
Beta Was this translation helpful? Give feedback.
All reactions