PyTorch implementation of the Reinforcement Learning for Distant Supervision RE model described in our ACL 2018 paper Robust Distant Supervision Relation Extraction via Deep Reinforcement Learning. In this work, we try to use reinforcement learning method to detect and remove noise instances for each relation type; moreover, this process is independent to the traning of relation extraction system.
Python 2.7.12PyTorch 0.4.1panda 0.19.1
- Dataset and Pretrained word embeddings are from OpenNRE. Please download(Baidu Yun or Google Drive) and put it into this directory.
- We include two versions of training dataset; they have different size,
522611sentences and570088sentences repectively. This two options are included inargs.py. Compared with570088version,522611version removes entity pairs that are repetitive with test dataset.522611is the default options inargs.py.
- python train.py
- The cleaned dataset is outputed to the directory
./cleaned_data.
- In order to validate the performance, we run thunlp/NRE on the cleaned dataset. For convenience, we have put their code in to the directory
./NRE-master. - Taking CNN-ONE model as an example, run the code by
- make
- ./train
- The Precision-Recall file is outputed to
./NRE-master/CNN-ONE/out. Good Precision-Recal curves can be obtained from pr11.txt to pr14.txt.
- plot_PR_curve.ipynb
@article{qin2018robust,
title={Robust Distant Supervision Relation Extraction via Deep Reinforcement Learning},
author={Qin, Pengda and Xu, Weiran and Wang, William Yang},
journal={arXiv preprint arXiv:1805.09927},
year={2018}
}