The scripts were written in python3.10. The packages required to run the scripts are listed in requirements.txt.
Steps:
- Manually downloaded search results from different sources are located here:
search_results/ - Search ACL for "moral":
- unzip:
unzip acl_anthology_31_12_2023.zip - run
python bib2json.py conf/bibjson.confto convert the downloaded anthology+abstracts.bib to a corresponding jsonl format - run
python acl_json_search.py conf/acl_json_search.confto collect ACL papers containing the keyword "moral"
- unzip:
- Deduplicate and merge search results to a single file:
- run
python merge_search_results.py conf/merge_search_results.conf
- run
The file resulting from step 2 is unique_search_results_31_12_2023.jsonl and contains all the files that went through the manual screening process explained below.
Screening was performed using screening.ods. The columns are defined as follows:
- Column A ("ID"): These bibkeys correspond to those in
unique_search_results_31_12_2023.jsonl - Column B ("Title"): Title of the paper
- Column C ("English"): Whether the paper is written in English
- Column D ("Accessible"): Whether the paper is accessible
- Column E ("Text/speech data"): Whether the methodology relies on text or speech data
- Column F ("Checked full-text"): Whether it was necessary to check more than just the title, abstract and keywords in order to decide whether to keep this paper
- Column G ("Passes screening"): Final decision for the paper in the screening process
- Column H ("Reasons to reject"): Why the paper was rejected in the screening process
- Column I ("Comments or contents of the paper"): Contribution(s) of the paper or a comment/note on the paper
The spreadsheet snowballing_candidates.ods was used to collect papers found via backward snowballing.
The columns are defined as follows:
- Column A ("ID"): Theses bibkeys correspond to those in
snowballing_candidates.bib. - Column B ("Title"): Title of the paper
- Column C ("Passes screening"): Final decision for the paper in the screening process
- Column D ("Reasons to reject"): Why the paper was rejected in the screening process
- Column E ("Comments or contents of the paper"): Contribution(s) of the paper or a comment/note on the paper
- Column F ("Found where/how?"): How this paper was found in the backward snowballing process.
Reviews for each paper were collected using the survey form in survey_form/. The survey form is accompanied by a codebook (codebook.pdf) that defines all the variables in the survey form and ensure consistent reviews.
For a given paper, the survey form saves the review to a file in json-format. The json-reviews for all 135 papers can be found in the following folder: all_reviews/.
Automated consistency checks were performed using consistency_checks.py.
Results are written to consistency_checks/.
The following script was used to extract information from the collection of reviews: analyze_reviews.py. Results are written to analysis_results/.
The file checklist.pdf corresponds to the checklist we released along with our paper.