You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Explore the current (unlabeled) predictions generated by the latest model incarnation. All statements yet to be labeled by current fact-checking sources (currently, only [Washington Post Factchecker](https://www.washingtonpost.com/graphics/politics/trump-claims-database)) are available.
84
-
Live predictions continuously added via [ipfs](https://ipfs.io). Twitter statements will be delayed by ~15 minutes to allow thread-based scoring. [Factba.se](https://factba.se) is polled for new statements every 10 minutes.
85
-
This explorer provides fact-checkers a means (one of many possible) of using current model predictions and may also help those building fact-checking systems evaluate the potential utility of integrating similar models into their systems.
83
+
Explore current predictions of the latest model. All statements that have yet to be labeled by the currently used fact-checking sources (only [Washington Post Factchecker](https://www.washingtonpost.com/graphics/politics/trump-claims-database) at present) are available.
84
+
85
+
Live predictions are continuously added via [ipfs](https://ipfs.io). Twitter statements will be delayed by ~15 minutes to allow thread-based scoring. [Factba.se](https://factba.se) is polled for new statements every 15 minutes.
86
86
87
+
This explorer provides fact-checkers a means (one of many possible) of using current model predictions and may also help those building fact-checking systems evaluate the potential utility of integrating similar models into their systems.
@@ -116,7 +117,7 @@ The entire initial Deep Classiflie system (raw dataset, model, analytics modules
116
117
</summary>
117
118
118
119
- Fine-tune a base model (currently HuggingFace's [ALBERT implementation](https://huggingface.co/transformers/model_doc/albert.html) with some minor customizations) in tandem with a simple embedding reflecting the semantic shift associated with the medium via which the statement was conveyed (i.e., for the POC, just learn the tweet vs non-tweet transformation) (using [Pytorch](https://pytorch.org/))
119
-
- Explore the latest model's training session on [tensorboard.dev](https://tensorboard.dev/experiment/rGNQpYnYSOaHb2A84xRAzw).
120
+
- Explore the latest model's training session on [tensorboard.dev](https://tensorboard.dev/experiment/Ys0KLo5nRnq0soINjyEv4A).
120
121
- N.B. neuro-symbolic methods<supid="a6">[6](#f6)</sup> that leverage knowledge bases and integrate symbolic reasoning with connectionist methods are not used in this model. Use of these approaches may be explored in [future research](#further-research) using this framework.
Global metrics<supid="a9">[9](#f9)</sup> summarized in the table below relate to the current model's performance on a test set comprised of ~12K statements made between 2020-04-03 and 2020-07-08:<br/>
144
+
Global metrics<supid="a9">[9](#f9)</sup> summarized in the table below relate to the current model's performance on a test set comprised of ~13K statements made between 2020-04-03 and 2020-07-08:<br/>
144
145
<imgsrc="docs/assets/global_metrics_summ.png"alt="Global Stat Summary" />
145
146
146
147
</details>
@@ -180,7 +181,7 @@ To minimize false positives and maximize the model's utility, the following appr
180
181
- Generate and configure thawing schedules for models.
- Both automatic and manually-specified [stochastic weight averaging](https://pytorch.org/blog/stochastic-weight-averaging-in-pytorch/) of model checkpoints<supid="af">[f](#cf)</sup>
183
-
-mixed-precision training via [apex](https://github.com/NVIDIA/apex)<supid="ag">[g](#cg)</sup>
@@ -274,15 +275,9 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
274
275
cd transformers
275
276
pip install .
276
277
```
277
-
4. (temporarily required) Testing of this alpha release occurred before native AMP was integrated into Pytorch with release 1.6. As such, native apex installation is temporarily (as of 2020.08.18) required to replicate the model. Switching from the native AMP api to the pytorch integrated one is planned as part of issue #999 which should obviate the need to install native apex.
5. [Install mariadb](https://mariadb.com/kb/en/getting-installing-and-upgrading-mariadb/) or mysql DB if necessary.
285
-
6. These are the relevant DB configuration settings used for the current release of Deep Classiflie's backend. Divergence from this configuration has not been tested and may result in unexpected behavior.
278
+
4. [Install mariadb](https://mariadb.com/kb/en/getting-installing-and-upgrading-mariadb/) or mysql DB if necessary.
279
+
280
+
5. These are the relevant DB configuration settings used for the current release of Deep Classiflie's backend. Divergence from this configuration has not been tested and may result in unexpected behavior.
286
281
287
282
```mysql
288
283
collation-server = utf8mb4_unicode_ci
@@ -291,7 +286,7 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
11. Run deep_classiflie.py with the provided config necessary to download the raw data from the relevant data sources (factba.se, twitter, washington post), execute the data processing pipeline and generate the dataset collection.
339
+
10. Run deep_classiflie.py with the provided config necessary to download the raw data from the relevant data sources (factba.se, twitter, washington post), execute the data processing pipeline and generate the dataset collection.
13. Generate an swa checkpoint (current release was built using swa torchcontrib module but will switch to the now-integrated pytorch swa api in the next release):
371
+
12. Generate an swa checkpoint (current release was built using swa torchcontrib module but will switch to the now-integrated pytorch swa api in the next release):
16. configure jekyll static site generator to use bokeh dashboards locally:
391
+
15. configure jekyll static site generator to use bokeh dashboards locally:
399
392
400
393
```shell
401
394
@@ -447,8 +440,8 @@ N.B. before you begin, the core external dependency is admin access to a mariadb
447
440
<li><span class="fnum" id="cc">[c]</span> Deep Classiflie depends upon deep_classiflie_db (initially released as a separate repository) for much of its analytical and dataset generation functionality. Depending on how Deep Classiflie evolves (e.g. as it supports distributed data stores etc.), it may make more sense to integrate deep_classiflie_db back into deep_classiflie. <a href="#ac">↩</a></li>
448
441
<li><span class="fnum" id="cd">[d]</span> It's notable that the model suffers a much higher FP ratio on tweets relative to non-tweets. Exploring tweet FPs, there are a number of plausible explanations for this discrepancy which could be explored in future research. <a href="#ad">↩</a></li>
449
442
<li><span class="fnum" id="ce">[e]</span> Still in early development, there are significant outstanding issues (e.g. no tests yet!) and code quality shortcomings galore, but any constructive thoughts or contributions are welcome. I'm interested in using ML to curtail disinformation, not promulgate it, so I want to be clear -- this is essentially a fancy sentence similarity system with a lot of work put into building the dataset generation and model analysis data pipelines (I have a data engineering background, not a software engineering one).<a href="#ae">↩</a></li>
450
-
<li><span class="fnum" id="cf">[f]</span>Current model release built/tested before swa graduated from torchcontrib to core pytorch. Next release of Deep Classiflie will use the integrated swa api.<a href="#af">↩</a></li>
451
-
<li><span class="fnum" id="cg">[g]</span>Current model release built/tested before AMP was integrated into core pytorch. Next release of Deep Classiflie will use the integrated AMP api.<a href="#ag">↩</a></li>
443
+
<li><span class="fnum" id="cf">[f]</span>Previous versions used the swa module from torchcontrib before it graduated to core pytorch.<a href="#af">↩</a></li>
444
+
<li><span class="fnum" id="cg">[g]</span>Previous versions used NVIDIA's native <a href="https://github.com/NVIDIA/apex">apex</a> before AMP was integrated into pytorch<a href="#ag">↩</a></li>
452
445
<li><span class="fnum" id="ch">[h]</span> N.B. This daemon may violate Twitter's <a href="https://help.twitter.com/en/rules-and-policies/twitter-automation">policy</a> w.r.t. tweeting sensitive content if the subject's statements contain such content (no content-based filtering is included in the daemon). @DeepClassflie initially tested the Deep Classiflie twitter daemon but will post only framework-related announcements moving forward.<a href="#ah">↩</a></li>
Copy file name to clipboardExpand all lines: configs/gen_dashboards.yaml
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
experiment:
2
2
db_functionality_enabled: True # must set to True to generate reports, run dctweetbot, among other functions
3
3
# provide the generated swa checkpoint below
4
-
inference_ckpt: "/home/speediedan/experiments/deep_classiflie/checkpoints/20200816114940/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt"# note build_swa_from_ckpts will be ignored if inference_ckpt is present
4
+
inference_ckpt: "/home/speediedan/experiments/deep_classiflie/checkpoints/20200911144157/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt"# note build_swa_from_ckpts will be ignored if inference_ckpt is present
Copy file name to clipboardExpand all lines: configs/gen_report.yaml
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,13 @@
1
1
experiment:
2
2
db_functionality_enabled: True # must set to True to generate reports, run dctweetbot, among other functions
3
3
# provide the generated swa checkpoint below
4
-
inference_ckpt: "/home/speediedan/experiments/deep_classiflie_feat/checkpoints/20200901084410/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt"# note build_swa_from_ckpts will be ignored if inference_ckpt is present
4
+
inference_ckpt: "/home/speediedan/experiments/deep_classiflie/checkpoints/20200911144157/checkpoint-0.0000-swa_best_2_ckpts--1-0.pt"# note build_swa_from_ckpts will be ignored if inference_ckpt is present
5
5
debug:
6
6
debug_enabled: False
7
7
data_source:
8
8
skip_db_refresh: True
9
9
# db_conf must be explictly specified only in dev mode or if db_conf is in a non-default location
0 commit comments