[question] Topic number selection using Cross Validation

Hi,

I'm quite new to topic modelling and I've been working on a particular project with a very large corpus. Performing LDA using gibb-sampler is out of the question (atleast not for cross-validation due to computational constraints). Warp-LDA is the only viable option. 

I've been trying to select topic number (k) using various measures. I tried using perplexity but it just seems to keep on decreasing with increasing k and I couldn't identify a clear cut off or elbow. Then I tried coherence measures and I scaled these measures and I've plotted them against each other. Can anyone help me identify what exactly are these measures telling us. Is there any particular k that seems of interest?

<img width="844" alt="Screen Shot 2021-12-10 at 10 49 24 pm" src="https://user-images.githubusercontent.com/63903169/145571879-f9ad99db-7785-42b3-bc00-5051194d5ee8.png">

Also, any form of help as to how should I approach this would be fantastic.  Below are the values I used for other model parameters:
doc_topic_prior = 0.1, #0.1
topic_word_prior = 0.01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[question] Topic number selection using Cross Validation #332

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[question] Topic number selection using Cross Validation #332

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions