Skip to content

Auto-correlation between samples (Binkley et al.) #201

Open
@jonaschn

Description

I recently found this paper by Binkley et al.

A short extract from this paper follows:

  • b – the number of burn-in iterations
  • n – the number of samples (random variates)
  • si – the sampling interval

If si is large enough, the observations are practically independent. However, too small a value risks unwanted correlation. To summarize the effect of b, n, and si: if any of these settings are too low, then the Gibbs sampler will produce inaccurate or inadequate information; if any of these settings are too high, then the only penalty is wasted computational effort.
Unfortunately, as described in Section 6, support for extracting
interval-separated observations is limited in existing LDA tools. For example,
For example, Mallet provides this capability but appears to suffer from a local maxima problem

with a footnote linking to http://www.cs.loyola.edu/~binkley/topic_models/additional-images/mallet-fixation/

Does this problem still exist?

Reference:
Binkley, D., Heinz, D., Lawrie, D., & Overfelt, J. (2014). Understanding LDA in source code analysis. 22nd International Conference on Program Comprehension, ICPC 2014 - Proceedings, 26–36. https://doi.org/10.1145/2597008.2597150

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions