Skip to content
Discussion options

You must be logged in to vote

The number of components for your UMAP model is rather low. I would definitely increase that to at least 5. You are losing a lot of information when reducing it to (almost) the bare minimum.

With respect to HDBSCAN, the min_cluster_size is quite high and HDBSCAN then tends to create rather abstract and broad clusters. I would advise lowering that value and potentially merging topics later on if needed. Doing the latter would also show what it means to get a few topics.

Lastly, the cfg.EMBEDDING_MODEL might also be related but depends on its contents.

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@carobs9
Comment options

@MaartenGr
Comment options

@carobs9
Comment options

@MaartenGr
Comment options

Answer selected by carobs9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants