Skip to content

Commit 6364a53

Browse files
committed
Scripts for ASReview paper
1 parent 1bb221f commit 6364a53

File tree

24 files changed

+43157
-2
lines changed

24 files changed

+43157
-2
lines changed

.gitignore

Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
2+
# Created by https://www.toptal.com/developers/gitignore/api/python,jupyternotebooks
3+
# Edit at https://www.toptal.com/developers/gitignore?templates=python,jupyternotebooks
4+
5+
### JupyterNotebooks ###
6+
# gitignore template for Jupyter Notebooks
7+
# website: http://jupyter.org/
8+
9+
.ipynb_checkpoints
10+
*/.ipynb_checkpoints/*
11+
12+
# IPython
13+
profile_default/
14+
ipython_config.py
15+
16+
# Remove previous ipynb_checkpoints
17+
# git rm -r .ipynb_checkpoints/
18+
19+
### Python ###
20+
# Byte-compiled / optimized / DLL files
21+
__pycache__/
22+
*.py[cod]
23+
*$py.class
24+
25+
# C extensions
26+
*.so
27+
28+
# Distribution / packaging
29+
.Python
30+
build/
31+
develop-eggs/
32+
dist/
33+
downloads/
34+
eggs/
35+
.eggs/
36+
lib/
37+
lib64/
38+
parts/
39+
sdist/
40+
var/
41+
wheels/
42+
pip-wheel-metadata/
43+
share/python-wheels/
44+
*.egg-info/
45+
.installed.cfg
46+
*.egg
47+
MANIFEST
48+
49+
# PyInstaller
50+
# Usually these files are written by a python script from a template
51+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
52+
*.manifest
53+
*.spec
54+
55+
# Installer logs
56+
pip-log.txt
57+
pip-delete-this-directory.txt
58+
59+
# Unit test / coverage reports
60+
htmlcov/
61+
.tox/
62+
.nox/
63+
.coverage
64+
.coverage.*
65+
.cache
66+
nosetests.xml
67+
coverage.xml
68+
*.cover
69+
*.py,cover
70+
.hypothesis/
71+
.pytest_cache/
72+
pytestdebug.log
73+
74+
# Translations
75+
*.mo
76+
*.pot
77+
78+
# Django stuff:
79+
*.log
80+
local_settings.py
81+
db.sqlite3
82+
db.sqlite3-journal
83+
84+
# Flask stuff:
85+
instance/
86+
.webassets-cache
87+
88+
# Scrapy stuff:
89+
.scrapy
90+
91+
# Sphinx documentation
92+
docs/_build/
93+
doc/_build/
94+
95+
# PyBuilder
96+
target/
97+
98+
# Jupyter Notebook
99+
100+
# IPython
101+
102+
# pyenv
103+
.python-version
104+
105+
# pipenv
106+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
107+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
108+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
109+
# install all needed dependencies.
110+
#Pipfile.lock
111+
112+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
113+
__pypackages__/
114+
115+
# Celery stuff
116+
celerybeat-schedule
117+
celerybeat.pid
118+
119+
# SageMath parsed files
120+
*.sage.py
121+
122+
# Environments
123+
.env
124+
.venv
125+
env/
126+
venv/
127+
ENV/
128+
env.bak/
129+
venv.bak/
130+
131+
# Spyder project settings
132+
.spyderproject
133+
.spyproject
134+
135+
# Rope project settings
136+
.ropeproject
137+
138+
# mkdocs documentation
139+
/site
140+
141+
# mypy
142+
.mypy_cache/
143+
.dmypy.json
144+
dmypy.json
145+
146+
# Pyre type checker
147+
.pyre/
148+
149+
# pytype static type analyzer
150+
.pytype/
151+
152+
# End of https://www.toptal.com/developers/gitignore/api/python,jupyternotebooks

.zenodo.json

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
{
2+
"description": "",
3+
"title": "Scripts for 'ASReview: Open Source Software for Efficient and Transparent Active Learning for Systematic Reviews'",
4+
"creators": [
5+
{
6+
"name": "Ferdinands, Gerbrich",
7+
"affiliation": "Utrecht University",
8+
"orcid": "0000-0002-4998-3293"
9+
},
10+
{
11+
"name": "Schram, Raoul",
12+
"affiliation": "Utrecht University",
13+
"orcid": "0000-0001-6616-230X"
14+
},
15+
{
16+
"name": "Van de Schoot, Rens",
17+
"affiliation": "Department of Psychology, Utrecht University",
18+
"orcid": "0000-0001-7736-2091"
19+
},
20+
{
21+
"name": "De Bruin, Jonathan",
22+
"affiliation": "Utrecht University",
23+
"orcid": "0000-0002-4297-0502"
24+
}
25+
26+
],
27+
"keywords": [
28+
"systematic review",
29+
"prisma",
30+
"active learning",
31+
"statistics",
32+
"machine learning",
33+
"text data",
34+
"natural language processing",
35+
"human-in-the-loop"
36+
],
37+
"related_identifiers": [
38+
{
39+
"relation": "isSupplementTo",
40+
"identifier": "http://arxiv.org/abs/2006.12166"
41+
}
42+
],
43+
"license": "Apache-2.0",
44+
"upload_type": "software"
45+
}

Hyperparameter_optimization/Data/ace.csv

Lines changed: 2236 additions & 0 deletions
Large diffs are not rendered by default.

Hyperparameter_optimization/Data/nudging.csv

Lines changed: 1866 additions & 0 deletions
Large diffs are not rendered by default.

Hyperparameter_optimization/Data/ptsd.csv

Lines changed: 5032 additions & 0 deletions
Large diffs are not rendered by default.

Hyperparameter_optimization/Data/software.csv

Lines changed: 8897 additions & 0 deletions
Large diffs are not rendered by default.

Hyperparameter_optimization/Data/virus.csv

Lines changed: 2305 additions & 0 deletions
Large diffs are not rendered by default.

Hyperparameter_optimization/Data/wilson.csv

Lines changed: 2334 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Hyperparameter optimization
2+
Optimize hyperparameters for 12 hours, over 6 datasets in the `data` directory for a model with Naive Bayes classifier (B), tfidf feature extraction (T), certainty sampling (max) query strategy (C), double balance strategy (D):
3+
4+
`mpirun -n 2 asreview hyper-active -m nb -b double -e tfidf -q max -t 12:00:00 --mpi`
5+
6+
See the [asreview-hyperopt repository](https://github.com/asreview/asreview-hyperopt) for more information on the optimization of hyperparameters.
7+
8+
After 12 hours of optimizing, a configuration file with optimized hyperparameters is created:
9+
10+
`asreview create-config output/active/nb_max_double_tfidf/nudging_software_wilson_ace_virus/trials.pkl -o config/BCTD.ini`
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
[global_settings]
2+
n_instances = 20
3+
n_prior_included = 1
4+
n_prior_excluded = 1
5+
model = nb
6+
balance_strategy = double
7+
query_strategy = max
8+
feature_extraction = tfidf
9+
10+
[balance_param]
11+
a = 0.7782141491367417
12+
alpha = 1.39537927110139
13+
b = 0.9147018292161516
14+
15+
[feature_param]
16+
ngram_max = 2
17+
split_ta = 0
18+
19+
[model_param]
20+
alpha = 2.1292734234216257

0 commit comments

Comments
 (0)