Skip to content

Commit bda664a

Browse files
author
Daniel Dale
committed
added current predictions explorer among other improvements for 0.1.1 release
1 parent fa6dada commit bda664a

36 files changed

+252685
-166
lines changed

README.md

Lines changed: 26 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -80,10 +80,11 @@ The best way to start understanding/exploring the current model is to use the ex
8080
<details><summary markdown="span"><strong>[Current Predictions Explorer](current_explorer.html)</strong>
8181
</summary>
8282

83-
Explore the current (unlabeled) predictions generated by the latest model incarnation. All statements yet to be labeled by current fact-checking sources (currently, only [Washington Post Factchecker](https://www.washingtonpost.com/graphics/politics/trump-claims-database)) are available.
84-
Live predictions continuously added via [ipfs](https://ipfs.io). Twitter statements will be delayed by ~15 minutes to allow thread-based scoring. [Factba.se](https://factba.se) is polled for new statements every 10 minutes.
85-
This explorer provides fact-checkers a means (one of many possible) of using current model predictions and may also help those building fact-checking systems evaluate the potential utility of integrating similar models into their systems.
83+
Explore current predictions of the latest model. All statements that have yet to be labeled by the currently used fact-checking sources (only [Washington Post Factchecker](https://www.washingtonpost.com/graphics/politics/trump-claims-database) at present) are available.
84+
85+
Live predictions are continuously added via [ipfs](https://ipfs.io). Twitter statements will be delayed by ~15 minutes to allow thread-based scoring. [Factba.se](https://factba.se) is polled for new statements every 15 minutes.
8686

87+
This explorer provides fact-checkers a means (one of many possible) of using current model predictions and may also help those building fact-checking systems evaluate the potential utility of integrating similar models into their systems.
8788

8889
<img src="docs/assets/current_explorer.gif" alt="current predictions explorer" />
8990
</details>
@@ -116,7 +117,7 @@ The entire initial Deep Classiflie system (raw dataset, model, analytics modules
116117
</summary>
117118

118119
- Fine-tune a base model (currently HuggingFace's [ALBERT implementation](https://huggingface.co/transformers/model_doc/albert.html) with some minor customizations) in tandem with a simple embedding reflecting the semantic shift associated with the medium via which the statement was conveyed (i.e., for the POC, just learn the tweet vs non-tweet transformation) (using [Pytorch](https://pytorch.org/))
119-
- Explore the latest model's training session on [tensorboard.dev](https://tensorboard.dev/experiment/rGNQpYnYSOaHb2A84xRAzw).
120+
- Explore the latest model's training session on [tensorboard.dev](https://tensorboard.dev/experiment/Ys0KLo5nRnq0soINjyEv4A).
120121
- N.B. neuro-symbolic methods<sup id="a6">[6](#f6)</sup> that leverage knowledge bases and integrate symbolic reasoning with connectionist methods are not used in this model. Use of these approaches may be explored in [future research](#further-research) using this framework.
121122
</details>
122123
<details><summary markdown="span"><strong>Analysis & Reporting</strong>
@@ -140,7 +141,7 @@ The entire initial Deep Classiflie system (raw dataset, model, analytics modules
140141
<details><summary markdown="span"><strong>Global</strong>
141142
</summary>
142143

143-
Global metrics<sup id="a9">[9](#f9)</sup> summarized in the table below relate to the current model's performance on a test set comprised of ~12K statements made between 2020-04-03 and 2020-07-08:<br/>
144+
Global metrics<sup id="a9">[9](#f9)</sup> summarized in the table below relate to the current model's performance on a test set comprised of ~13K statements made between 2020-04-03 and 2020-07-08:<br/>
144145
<img src="docs/assets/global_metrics_summ.png" alt="Global Stat Summary" />
145146

146147
</details>
@@ -180,7 +181,7 @@ To minimize false positives and maximize the model's utility, the following appr
180181
- Generate and configure thawing schedules for models.
181182
- EarlyStopping easily configurable with multiple non-standard monitor metrics (e.g. mcc)
182183
- Both automatic and manually-specified [stochastic weight averaging](https://pytorch.org/blog/stochastic-weight-averaging-in-pytorch/) of model checkpoints<sup id="af">[f](#cf)</sup>
183-
- mixed-precision training via [apex](https://github.com/NVIDIA/apex)<sup id="ag">[g](#cg)</sup>
184+
- Mixed-precision training<sup id="ag">[g](#cg)</sup>
184185
</details>
185186
<details><summary markdown="span"><strong>Analysis & reporting</strong>
186187
</summary>
@@ -274,15 +275,9 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
274275
cd transformers
275276
pip install .
276277
```
277-
4. (temporarily required) Testing of this alpha release occurred before native AMP was integrated into Pytorch with release 1.6. As such, native apex installation is temporarily (as of 2020.08.18) required to replicate the model. Switching from the native AMP api to the pytorch integrated one is planned as part of issue #999 which should obviate the need to install native apex.
278-
```shell
279-
git clone https://github.com/NVIDIA/apex
280-
cd apex
281-
pip uninstall apex
282-
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
283-
```
284-
5. [Install mariadb](https://mariadb.com/kb/en/getting-installing-and-upgrading-mariadb/) or mysql DB if necessary.
285-
6. These are the relevant DB configuration settings used for the current release of Deep Classiflie's backend. Divergence from this configuration has not been tested and may result in unexpected behavior.
278+
4. [Install mariadb](https://mariadb.com/kb/en/getting-installing-and-upgrading-mariadb/) or mysql DB if necessary.
279+
280+
5. These are the relevant DB configuration settings used for the current release of Deep Classiflie's backend. Divergence from this configuration has not been tested and may result in unexpected behavior.
286281

287282
```mysql
288283
collation-server = utf8mb4_unicode_ci
@@ -291,7 +286,7 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
291286
sql_mode = 'STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION,ANSI_QUOTES'
292287
transaction-isolation = READ-COMMITTED
293288
```
294-
7. copy/update relevant Deep Classiflie config file to $HOME dir
289+
6. copy/update relevant Deep Classiflie config file to $HOME dir
295290
```shell
296291
cp ./deep_classiflie_db/db_setup/.dc_config.example ~
297292
mv .dc_config.example .dc_config
@@ -317,33 +312,31 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
317312
export DCDB_NAME="deep_classiflie"
318313
```
319314

320-
8. execute Deep Classiflie DB backend initialization script:
321-
322-
<img src="docs/assets/dc_schema_build.gif" alt="Deep Classiflie logo" />
323-
324-
Ensure you have access to a DB user with administrator privs. "admin" in the case above.
325-
315+
7. execute Deep Classiflie DB backend initialization script:
326316
```shell
327317
cd deep_classiflie_db/db_setup
328318
./deep_classiflie_db_setup.sh deep_classiflie
329319
```
320+
Ensure you have access to a DB user with administrator privs. "admin" in the case above.
321+
<img src="docs/assets/dc_schema_build.gif" alt="Deep Classiflie logo" />
330322

331-
9. login to the backend db and seed historical tweets (necessary as only most recent 3200 can currently be retrieved directly from twitter)
323+
8. login to the backend db and seed historical tweets (necessary as only most recent 3200 can currently be retrieved directly from twitter)
332324
```mysql
333325
mysql -u dcbot -p
334326
use deep_classiflie
335-
source dcbot_tweets_init_20200814.sql
327+
source dcbot_tweets_init_20200910.sql
328+
exit
336329
```
337330

338-
10. copy over relevant base model weights to specified model_cache_dir:
331+
9. copy over relevant base model weights to specified model_cache_dir:
339332
```shell
340333
# model_cache_dir default found in configs/config_defaults.yaml
341334
# it defaults to $HOME/datasets/model_cache/deep_classiflie/
342335
cd {PATH_TO_DEEP_CLASSIFLIE_BASE}/deep_classiflie/assets/
343336
cp albert-base-v2-pytorch_model.bin albert-base-v2-spiece.model {MODEL_CACHE_DIR}/
344337
```
345338

346-
11. Run deep_classiflie.py with the provided config necessary to download the raw data from the relevant data sources (factba.se, twitter, washington post), execute the data processing pipeline and generate the dataset collection.
339+
10. Run deep_classiflie.py with the provided config necessary to download the raw data from the relevant data sources (factba.se, twitter, washington post), execute the data processing pipeline and generate the dataset collection.
347340
```shell
348341
cd deep_classiflie
349342
./deep_classiflie.py --config "{PATH_TO_DEEP_CLASSIFLIE_BASE}/configs/dataprep_only.yaml"
@@ -369,33 +362,33 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
369362
2020-08-14 16:58:14,331:deep_classiflie:DEBUG: Metadata update complete, 1 record(s) affected.
370363
...
371364
```
372-
12. Recursively train the deep classiflie POC model:
365+
11. Recursively train the deep classiflie POC model:
373366
```shell
374367
cd deep_classiflie
375368
./deep_classiflie.py --config "{PATH_TO_DEEP_CLASSIFLIE_BASE}/configs/train_albertbase.yaml"
376369
```
377370

378-
13. Generate an swa checkpoint (current release was built using swa torchcontrib module but will switch to the now-integrated pytorch swa api in the next release):
371+
12. Generate an swa checkpoint (current release was built using swa torchcontrib module but will switch to the now-integrated pytorch swa api in the next release):
379372
```shell
380373
cd deep_classiflie
381374
./deep_classiflie.py --config "{PATH_TO_DEEP_CLASSIFLIE_BASE}/configs/gen_swa_ckpt.yaml"
382375
```
383376

384-
14. Generate model analysis report(s) using the generated swa checkpoint:
377+
13. Generate model analysis report(s) using the generated swa checkpoint:
385378
```shell
386379
# NOTE, swa checkpoint generated in previous step must be added to gen_report.yaml
387380
cd deep_classiflie
388381
./deep_classiflie.py --config "{PATH_TO_DEEP_CLASSIFLIE_BASE}/configs/gen_report.yaml"
389382
```
390383

391-
15. Generate model analysis dashboards:
384+
14. Generate model analysis dashboards:
392385
```shell
393386
# NOTE, swa checkpoint generated in previous step must be added to gen_dashboards.yaml
394387
cd deep_classiflie
395388
./deep_classiflie.py --config "{PATH_TO_DEEP_CLASSIFLIE_BASE}/configs/gen_dashboards.yaml"
396389
```
397390

398-
16. configure jekyll static site generator to use bokeh dashboards locally:
391+
15. configure jekyll static site generator to use bokeh dashboards locally:
399392

400393
```shell
401394
@@ -447,8 +440,8 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
447440
<li><span class="fnum" id="cc">[c]</span> Deep Classiflie depends upon deep_classiflie_db (initially released as a separate repository) for much of its analytical and dataset generation functionality. Depending on how Deep Classiflie evolves (e.g. as it supports distributed data stores etc.), it may make more sense to integrate deep_classiflie_db back into deep_classiflie. <a href="#ac"></a></li>
448441
<li><span class="fnum" id="cd">[d]</span> It's notable that the model suffers a much higher FP ratio on tweets relative to non-tweets. Exploring tweet FPs, there are a number of plausible explanations for this discrepancy which could be explored in future research. <a href="#ad">↩</a></li>
449442
<li><span class="fnum" id="ce">[e]</span> Still in early development, there are significant outstanding issues (e.g. no tests yet!) and code quality shortcomings galore, but any constructive thoughts or contributions are welcome. I'm interested in using ML to curtail disinformation, not promulgate it, so I want to be clear -- this is essentially a fancy sentence similarity system with a lot of work put into building the dataset generation and model analysis data pipelines (I have a data engineering background, not a software engineering one).<a href="#ae"></a></li>
450-
<li><span class="fnum" id="cf">[f]</span> Current model release built/tested before swa graduated from torchcontrib to core pytorch. Next release of Deep Classiflie will use the integrated swa api.<a href="#af"></a></li>
451-
<li><span class="fnum" id="cg">[g]</span> Current model release built/tested before AMP was integrated into core pytorch. Next release of Deep Classiflie will use the integrated AMP api.<a href="#ag"></a></li>
443+
<li><span class="fnum" id="cf">[f]</span> Previous versions used the swa module from torchcontrib before it graduated to core pytorch.<a href="#af"></a></li>
444+
<li><span class="fnum" id="cg">[g]</span> Previous versions used NVIDIA's native <a href="https://github.com/NVIDIA/apex">apex</a> before AMP was integrated into pytorch<a href="#ag">↩</a></li>
452445
<li><span class="fnum" id="ch">[h]</span> N.B. This daemon may violate Twitter's <a href="https://help.twitter.com/en/rules-and-policies/twitter-automation">policy</a> w.r.t. tweeting sensitive content if the subject's statements contain such content (no content-based filtering is included in the daemon). @DeepClassflie initially tested the Deep Classiflie twitter daemon but will post only framework-related announcements moving forward.<a href="#ah">↩</a></li>
453446
</ul>
454447

analysis/captum_cust_viz.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -196,7 +196,7 @@ def fmt_notes_box(ext_rec: Tuple) -> str:
196196
<li>prediction of whether Washington Post's Fact Checker will add this claim to its "Trump False Claims" DB</li>
197197
<li>if claim was included in WP's Fact Checker false claims DB at time of original model training</li>
198198
<li>accuracy estimated by sorting & bucketing the test set sigmoid outputs, averaging performance in each bucket
199-
<li>global metrics relate to the current model's performance on a test set comprised of ~12K statements made between
199+
<li>global metrics relate to the current model's performance on a test set comprised of ~13K statements made between
200200
{ext_rec[5][13].strftime('%Y-%m-%d')} and {ext_rec[5][14].strftime('%Y-%m-%d')}. Training, validation and test sets
201201
are chronologically disjoint. </li>
202202
<li>subject to interpretability filter, some subword tokens have been omitted to facilitate interpretability</li>
@@ -259,10 +259,10 @@ def gen_pred_exp_attr_tup(datarecord: VisualizationDataRecord, ext_rec: Tuple, t
259259

260260

261261
def pred_exp_attr(datarecords: List[VisualizationDataRecord], ext_recs: List[Tuple] = None, token_mask: List = None,
262-
invert_colors: bool = False) -> Tuple[List, Tuple]:
262+
invert_colors: bool = False, **_) -> Tuple[List, Tuple]:
263263
global_metrics_summ = ext_recs[0][8]
264264
pred_exp_tups = []
265265
for i, (datarecord, ext_rec) in enumerate(zip(datarecords, ext_recs)):
266266
pred_exp_tup = gen_pred_exp_attr_tup(datarecord, ext_rec, token_mask, invert_colors)
267267
pred_exp_tups.append(pred_exp_tup)
268-
return pred_exp_tups, global_metrics_summ
268+
return pred_exp_tups, global_metrics_summ

analysis/gen_pred_explorer.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,6 @@ def init_radio_groups() -> Tuple[RadioButtonGroup, ...]:
3838

3939

4040
def init_explorer_divs(pred_stmt_dict: Dict) -> Tuple[Div, ...]:
41-
# stmt_attr, word_import_html, max_word_html
4241
default_idx = min([i for i, (b, c) in enumerate(zip(pred_stmt_dict['bucket_type'], pred_stmt_dict['tp']))
4342
if b == 'max_acc_nontweets' and c == 1])
4443
word_import_div = Div(text=pred_stmt_dict['pred_exp_attr_tups'][default_idx][1], height_policy='max',

analysis/model_analysis_rpt.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -74,15 +74,15 @@ def gen_pred_exp_ds(self) -> Tuple[Dict, Tuple]:
7474
pred_exp_tups = fetchallwrapper(self.cnxp.get_connection(), self.config.inference.sql.pred_exp_sql)
7575
pred_exp_set = []
7676
pred_exp_ds = OrderedDict({'bucket_type': [], 'bucket_acc': [], 'conf_percentile': [], 'pos_pred_acc': [],
77-
'neg_pred_acc': [], 'pos_pred_ratio': [], 'neg_pred_ratio': [], 'statement_id': [],
78-
'statement_text': [], 'tp': [], 'tn': [], 'fp': [], 'fn': []})
77+
'neg_pred_acc': [], 'pos_pred_ratio': [], 'neg_pred_ratio': [], 'statement_id': [],
78+
'statement_text': [], 'tp': [], 'tn': [], 'fp': [], 'fn': []})
7979
for (bucket_type, bucket_acc, conf_percentile, pos_pred_acc, neg_pred_acc, pos_pred_ratio, neg_pred_ratio,
8080
statement_id, statement_text, ctxt_type, tp, tn, fp, fn) in pred_exp_tups:
8181
label = 'False' if tp == 1 or fn == 1 else 'True'
8282
pred_exp_set.append((statement_text, ctxt_type, label))
8383
for k, v in zip(list(pred_exp_ds.keys()), [bucket_type, bucket_acc, conf_percentile, pos_pred_acc,
84-
neg_pred_acc, pos_pred_ratio, neg_pred_ratio, statement_id,
85-
statement_text, tp, tn, fp, fn]):
84+
neg_pred_acc, pos_pred_ratio, neg_pred_ratio, statement_id,
85+
statement_text, tp, tn, fp, fn]):
8686
pred_exp_ds[k].append(v)
8787
pred_exp_attr_tups, global_metric_summ = Inference(self.config, pred_exp_set=pred_exp_set).init_predict()
8888
pred_exp_ds['pred_exp_attr_tups'] = pred_exp_attr_tups

assets/dc_ds.zip

10.3 KB
Binary file not shown.

assets/dc_model_alpha.zip

-43.3 MB
Binary file not shown.

configs/config_defaults_sql.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ inference:
100100
global_model_perf_cache_sql: "select * from global_model_accuracy_lookup_cache"
101101
pred_exp_sql: "select * from pred_explr_stmts"
102102
save_model_sql: "insert into model_metadata select * from latest_global_model_perf_summary"
103-
save_perf_sql: "insert into local_model_perf_summary_hist select * from latest_local_model_perf_summary"
103+
save_perf_sql: "insert ignore into local_model_perf_summary_hist select * from latest_local_model_perf_summary"
104104
ds_md_sql: >-
105105
select dsid, train_start_date, train_end_date from ds_metadata where ds_type='converged_filtered' order by dsid desc limit 1
106106
save_model_rpt_sql: >-

configs/dataprep_only.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@ experiment:
55
dataprep_only: True
66
debug:
77
debug_enabled: True
8-
use_debug_dataset: True
8+
use_debug_dataset: False
99
data_source:
1010
# db_conf must be explictly specified only in dev mode or if db_conf is in a non-default location
11-
db_conf: "/home/speediedan/repos/edification/deep_classiflie_db_feat/deep_classiflie_db.yaml"
11+
# db_conf: "/home/speediedan/repos/edification/deep_classiflie_db_feat/deep_classiflie_db.yaml"
1212
model_filter_topk: 20
1313
filter_w_embed_cache: False
1414
# safest way to build a new dataset is to verify backup of the previous one and remove the relevant cache softlink

configs/gen_dashboards.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
experiment:
22
db_functionality_enabled: True # must set to True to generate reports, run dctweetbot, among other functions
33
# provide the generated swa checkpoint below
4-
inference_ckpt: "/home/speediedan/experiments/deep_classiflie/checkpoints/20200816114940/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt" # note build_swa_from_ckpts will be ignored if inference_ckpt is present
4+
inference_ckpt: "/home/speediedan/experiments/deep_classiflie/checkpoints/20200911144157/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt" # note build_swa_from_ckpts will be ignored if inference_ckpt is present
55
debug:
66
debug_enabled: False
77
data_source:

configs/gen_report.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
experiment:
22
db_functionality_enabled: True # must set to True to generate reports, run dctweetbot, among other functions
33
# provide the generated swa checkpoint below
4-
inference_ckpt: "/home/speediedan/experiments/deep_classiflie_feat/checkpoints/20200901084410/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt" # note build_swa_from_ckpts will be ignored if inference_ckpt is present
4+
inference_ckpt: "/home/speediedan/experiments/deep_classiflie/checkpoints/20200911144157/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt" # note build_swa_from_ckpts will be ignored if inference_ckpt is present
55
debug:
66
debug_enabled: False
77
data_source:
88
skip_db_refresh: True
99
# db_conf must be explictly specified only in dev mode or if db_conf is in a non-default location
10-
db_conf: "/home/speediedan/repos/edification/deep_classiflie_db/deep_classiflie_db.yaml"
10+
# db_conf: "/home/speediedan/repos/edification/deep_classiflie_db/deep_classiflie_db.yaml"
1111
inference:
1212
report_mode: True # set to true to enable report generation
1313
rebuild_perf_cache: True # set True to (re)build perf cache (report_mode must also be True)

0 commit comments

Comments
 (0)