This repo contains the code for paper On the Robustness of Reading Comprehension Models to Entity Renaming, accepted to NAACL 2022.
conda create -n robustness python=3.7
conda activate robustness
conda install pytorch==1.7.1 -c pytorch
pip install transformers==4.10.2
pip install sentencepiece
pip install datasets==1.11.0
pip install spacy==3.1.2
python -m spacy download en_core_web_smGo to ./prepare_data/.
bash download.sh
python preprocess_mrqa.py
python fix_mrqa_squad.py
python holdout_mrqa.pyThe MRC datasets will be prepared under ./data/.
Note that the new train/dev/test sets will be named as train_holdout.jsonl/dev_holdout.jsonl/dev.jsonl respectively.
<DATASET>: chosen from [SQuAD,NaturalQuestions,HotpotQA,SearchQA,TriviaQA]
Go to ./perturb/.
python run_context_ner.py --dataset <DATASET>
python extract_answer_entity.py --dataset <DATASET>This step generates dev_context_ner.jsonl and dev_answer_entity.jsonl under ./data/<DATASET>/.
<ENTITY_TYPE>: chosen from [person,org,gpe]
Go to ./perturb/<ENTITY_TYPE>/.
python get_subset_with_info.py --dataset <DATASET>This step generates answer_entity_with_info.jsonl under ./data/<DATASET>/<ENTITY_TYPE>/.
Go to ./perturb/<ENTITY_TYPE>/.
python perturb.py --dataset <DATASET> --perturbation noneThis step generates dev_subset.jsonl under ./data/<DATASET>/<ENTITY_TYPE>/.
It's a subset of the original test set that contains all instances where the perturbation for <ENTITY_TYPE> is applicable.
This is to ensure that the evaluation will be done on the same set of instances before and after perturbation.
Go to ./perturb/<ENTITY_TYPE>/.
python perturb.py --dataset <DATASET> --perturbation RandStr --seed <SAMPLING_SEED><SAMPLING_SEED>: anintfor specifying the random seed in sampling.
This step generates dev_subset_s<SAMPLING_SEED>.jsonl under ./data/<DATASET>/<ENTITY_TYPE>/RandStr/.
-
Sample Candidate Names from
<CANDIDATE_SOURCE><CANDIDATE_SOURCE>for person: chosen from [InDistName,EnName,ChineseName,ArabicName,FrenchName,IndianName]<CANDIDATE_SOURCE>for org and gpe: chosen from [InDistName,EnName]
Go to
./perturb/<ENTITY_TYPE>/<CANDIDATE_SOURCE>/.python prepare_candidates.py --dataset <DATASET>
This step generates
candidate_names.jsonlunder./data/<DATASET>/<ENTITY_TYPE>/<CANDIDATE_SOURCE>/. -
Substitute with Candidate Names from
<CANDIDATE_SOURCE>Go to
./perturb/<ENTITY_TYPE>/.python perturb.py --dataset <DATASET> --perturbation candidates --candidates_folder_name <CANDIDATE_SOURCE> --seed <SAMPLING_SEED>
This step generates
dev_subset_s<SAMPLING_SEED>.jsonlunder./data/<DATASET>/<ENTITY_TYPE>/<CANDIDATE_SOURCE>/.
Go to ./perturb/.
python mix_perturbations.py --dataset <DATASET> --perturbation <CANDIDATE_SOURCE> --seed <SAMPLING_SEED>This step merges the original and perturbed data for different entity types into a mix type.
Under ./data/<DATASET>/:
-
person/dev_subset.jsonl+org/dev_subset.jsonl+gpe/dev_subset.jsonl→
mix/dev_subset.jsonl -
person/<CANDIDATE_SOURCE>/dev_subset_s<SAMPLING_SEED>.jsonl+org/<CANDIDATE_SOURCE>/dev_subset_s<SAMPLING_SEED>.jsonl+gpe/<CANDIDATE_SOURCE>/dev_subset_s<SAMPLING_SEED>.jsonl→
mix/<CANDIDATE_SOURCE>/dev_subset_s<SAMPLING_SEED>.jsonl
mix can later be used as a new entity type in evaluation.
Go to ./.
python run_qa.py config/mrqa.json \
--model_name_or_path <MODEL_FULL_NAME> \
--train_jsonl data/<DATASET>/train_holdout.jsonl \
--eval_jsonl data/<DATASET>/dev_holdout.jsonl \
--output_dir models/<DATASET>/<MODEL_SAVE_NAME>_s<TRAINING_SEED> \
--output_pred_path models/<DATASET>/<MODEL_SAVE_NAME>_s<TRAINING_SEED>/dev_holdout_pred.jsonl \
--seed <TRAINING_SEED><MODEL_FULL_NAME>: chosen from [bert-base-cased,roberta-base,SpanBERT/spanbert-base-cased]<DATASET>: chosen from [SQuAD,NaturalQuestions,HotpotQA,SearchQA,TriviaQA]<MODEL_SAVE_NAME>: astrfor naming the folder to store the training checkpoints<TRAINING_SEED>: anintfor specifying the random seed in training
This step trains a model on the original training set. The model is saved under ./models/<DATASET>/<MODEL_SAVE_NAME>_s<TRAINING_SEED>.
Go to ./.
python run_qa.py config/mrqa_eval.json \
--model_name_or_path models/<DATASET>/<MODEL_SAVE_NAME>_s<TRAINING_SEED> \
--eval_jsonl <EVAL_JSONL_PATH> \
--output_pred_path <OUTPUT_PRED_PATH><EVAL_JSONL_PATH>: astrfor specifying the path for the original or perturbed test sets.- Path to the original test set:
./data/<DATASET>/<ENTITY_TYPE>/dev_subset.jsonl - Path to the perturbed test set:
./data/<DATASET>/<ENTITY_TYPE>/<CANDIDATE_SOURCE>/dev_subset_s<SAMPLING_SEED>.jsonl
- Path to the original test set:
<OUTPUT_PRED_PATH>: astrfor specifying the path to save model predictions.
This step evaluates the model on the original or perturbed test set.
The EM/F1 scores are printed at the end of training and recorded in the first line of the prediction file (<OUTPUT_PRED_PATH>).
@inproceedings{yan-etal-2022-robustness,
title = "On the Robustness of Reading Comprehension Models to Entity Renaming",
author = "Yan, Jun and Xiao, Yang and Mukherjee, Sagnik and Lin, Bill Yuchen and Jia, Robin and Ren, Xiang",
booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
month = jul,
year = "2022",
address = "Seattle, United States",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.naacl-main.37",
doi = "10.18653/v1/2022.naacl-main.37",
pages = "508--520",
}