Skip to content

Commit 7e094f2

Browse files
authored
Use scikit-learn for LDAModel (#607)
* Drop LDA. * Delete 03_lda.py * Use resources instead of test data. * Bundle sklearn model in new class. * More updates. * Fix. * Add test. * Update 03_plot_lda.py * Improve things. * Link to CBMA documentation. * Update 03_plot_lda.py * Update api.rst * More cleanup. * Remove Annotator class. The Annotator and Annotation classes will be developed in #618. * Update 03_plot_lda.py * Remove undefined base class.
1 parent 49d68fa commit 7e094f2

9 files changed

Lines changed: 181 additions & 324 deletions

File tree

docs/api.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -218,7 +218,6 @@ For more information about fetching data from the internet, see :ref:`fetching t
218218
extract.fetch_neuroquery
219219
extract.fetch_neurosynth
220220
extract.download_nidm_pain
221-
extract.download_mallet
222221
extract.download_cognitive_atlas
223222
extract.download_abstracts
224223
extract.download_peaks2maps_model

examples/03_annotation/03_lda.py

Lines changed: 0 additions & 43 deletions
This file was deleted.
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# emacs: -*- mode: python-mode; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*-
2+
# ex: set sts=4 ts=4 sw=4 et:
3+
"""
4+
5+
.. _annotations_lda:
6+
7+
==================
8+
LDA topic modeling
9+
==================
10+
11+
This example trains a latent Dirichlet allocation model with scikit-learn
12+
using abstracts from Neurosynth.
13+
"""
14+
import os
15+
16+
import pandas as pd
17+
18+
from nimare import annotate
19+
from nimare.dataset import Dataset
20+
from nimare.utils import get_resource_path
21+
22+
###############################################################################
23+
# Load dataset with abstracts
24+
# ---------------------------
25+
dset = Dataset(os.path.join(get_resource_path(), "neurosynth_laird_studies.json"))
26+
27+
###############################################################################
28+
# Initialize LDA model
29+
# --------------------
30+
model = annotate.lda.LDAModel(n_topics=5, max_iter=1000, text_column="abstract")
31+
32+
###############################################################################
33+
# Run model
34+
# ---------
35+
new_dset = model.fit(dset)
36+
37+
###############################################################################
38+
# View results
39+
# ------------
40+
# This DataFrame is very large, so we will only show a slice of it.
41+
new_dset.annotations[new_dset.annotations.columns[:10]].head(10)
42+
43+
###############################################################################
44+
# Given that this DataFrame is very wide (many terms), we will transpose it before presenting it.
45+
model.distributions_["p_topic_g_word_df"].T.head(10)
46+
47+
###############################################################################
48+
n_top_terms = 10
49+
top_term_df = model.distributions_["p_topic_g_word_df"].T
50+
temp_df = top_term_df.copy()
51+
top_term_df = pd.DataFrame(columns=top_term_df.columns, index=range(n_top_terms))
52+
top_term_df.index.name = "Token"
53+
for col in top_term_df.columns:
54+
top_tokens = temp_df.sort_values(by=col, ascending=False).index.tolist()[:n_top_terms]
55+
top_term_df.loc[:, col] = top_tokens
56+
57+
top_term_df

0 commit comments

Comments
 (0)