Study stability / coherence of topics versus training set size

We have a few huge corpus, in the order of tens or millions of documents. Training is costly. The question here is: 

Do we really need to train with the whole corpus ? Are topics much better than if training with say a maximum of 2 M documents ???  This should be studied because if no improvement is gained when training with very large corpora we could sample the training set, and then carry out inference on the whole set when calculating the indicators. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Study stability / coherence of topics versus training set size #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Study stability / coherence of topics versus training set size #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions