-
Notifications
You must be signed in to change notification settings - Fork 859
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Have you searched existing issues? 🔎
- I have searched and found no existing issues
Desribe the bug
I have 2 bertopic model with HDBSCAN configured similarly trained on 2 different subsets of data. However, when these models are merged the resulting merged model defaults to BaseCluster and bypasses the clustering when calling .transform()

Reproduction
umap_mode1l = UMAP(n_components=25, metric='cosine', random_state=42)
vectorizer_model1 = CountVectorizer(stop_words="english")
model1 = BERTopic(umap_model=umap_model1,
vectorizer_model=vectorizer_model1,
calculate_probabilities=True,
verbose=True)
model1.fit(data1, embeddings=embeddings1)
umap_model2 = UMAP(n_components=25, metric='cosine', random_state=42)
vectorizer_model2 = CountVectorizer(stop_words="english")
model2 = BERTopic(umap_model=umap_model2,
vectorizer_model=vectorizer_model2,
calculate_probabilities=True,
verbose=True)
model2.fit(data2, embeddings=embeddings2)
merged_model = BERTopic.merge_models([model1, model2],
min_similarity=0.7)
merged_model.hdbscan_model
BERTopic Version
0.17.3
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working