Skip to content

Conversation

jonaslandsgesell
Copy link

@jonaslandsgesell jonaslandsgesell commented Jan 3, 2024

As discussed in #1696, I provide an updated doc string to reflect that topic_model.transform(docs)[0][i] is sometimes different from topic_model.transform(docs[i])[0][0]

@MaartenGr
Copy link
Owner

Thanks for this PR! Could you rephrase the following a bit:

(especially when using the HDBSCAN algorithm)

This makes it seems that this behavior is across many different algorithms when in reality this is HDBSCAN-specific behavior.

@jonaslandsgesell
Copy link
Author

jonaslandsgesell commented Feb 8, 2024

Sure! Do you have a suggestion for a specific wording?

I am currently lacking the fantasy for other ways to express the fact that HDBSCAN is responsible here while we could also have a pipeline without HDBSCAN (but another component which may or may not behave similarly)

@MaartenGr
Copy link
Owner

Sure! Do you have a suggestion for a specific wording?

I am currently lacking the fantasy for other ways to express the fact that HDBSCAN is responsible here while we could also have a pipeline without HDBSCAN (but another component which may or may not behave similarly)

You could do something like this: "A single document or a list of documents to predict the topic(s) for. NOTE: When using
HDBSCAN, the prediction might differ depending on whether a single document or a list of documents is passed
since it leverages the data points of other documents"
.

I think it's best to stay close to the original documentation and inner workings of HDBSCAN. I believe this and this resource are relevant from the top of my head.

Also, a small tip. ChatGPT works wonders for helping with these kinds of issues ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants