Skip to content

[question] Topic number selection using Cross Validation #332

Open
@j-aryan

Description

@j-aryan

Hi,

I'm quite new to topic modelling and I've been working on a particular project with a very large corpus. Performing LDA using gibb-sampler is out of the question (atleast not for cross-validation due to computational constraints). Warp-LDA is the only viable option.

I've been trying to select topic number (k) using various measures. I tried using perplexity but it just seems to keep on decreasing with increasing k and I couldn't identify a clear cut off or elbow. Then I tried coherence measures and I scaled these measures and I've plotted them against each other. Can anyone help me identify what exactly are these measures telling us. Is there any particular k that seems of interest?

Screen Shot 2021-12-10 at 10 49 24 pm

Also, any form of help as to how should I approach this would be fantastic. Below are the values I used for other model parameters:
doc_topic_prior = 0.1, #0.1
topic_word_prior = 0.01

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions