This repository leverages large language models (LLMs) for systematic review, enabling efficient information extraction. There are Jupyter notebooks to automate the data extraction (DE) process, develop metaprompts, and evaluate results. All notebooks are designed to run on Google Colaboratory. Below are the descriptions of each notebook and their functionalities.
pip install -r requirement.txtOn Google Colab (at the top of each notebook):
!pip install -r requirement.txt| Source format | Action |
|---|---|
| .docx / .txt | No preprocessing required – these files are consumed directly by the notebooks |
| Processed with the Adobe PDF Extract API to split the file into main text, tables, and figures |
Adobe PDF Extract API: https://github.com/adobe/pdfservices-extract-python-sdk-samples
After processing, each study is placed in its own sub‑folder inside IncludedTrials/:
IncludedTrials/
Kataoka2024/
figures/fileoutpartX.png # Figures (PNG)
tables/fileoutpartY.xlsx # Tables (Excel)
structuredData.json # Structured main text (JSON)
The notebooks assume this hierarchy when they load the source files.
This section contains notebooks for creating original descriptions for variables.
Before running this notebook you must define a Data‑Extraction (DE) manual and embed it in the notebook by assigning the complete manual to the variable protocol, e.g.:
protocol = """
<full DE manual here>
"""The DE manual must include, for every variable you plan to extract:
| Field | What to specify |
|---|---|
| Variable name | Exact label used in downstream analyses |
| Description / definition | A concise clinical or methodological definition |
| Extraction method | Where in each paper to look, how to parse the value, unit conversions, etc. |
| Calculation method (if derived) | Formulae for converting SE → SD, CI → SD, medians to means, etc. |
| Allowed response type | numeric, text, binary, choice list, etc. |
| Choice list (if applicable) | Enumerate every permissible option the model should pick from. |
Example excerpt
(truncated for brevity – see your actual manual for full list)
Age_mean : Mean age in years of participants per arm
Age_sd : SD of age. Use SD = SE*sqrt(n), or SD = (CI_upper – mean)*sqrt(n)/1.96, etc.
Age_n : Sample size used to compute Age_mean
Ind_clu : {individual | cluster}
ICC_for_cRCT : If cluster RCT and ICC not reported, default 0.05
Insomnia diagnosis : Choose one of {formal_DSM, formal_ICSD, formal_ICD, ...}
... (continue for all variables) ...
Place the fully detailed manual in the notebook before executing any cells; the subsequent code reads protocol directly when generating the original variable descriptions.
create_original_description.ipynb: Generates original meta-prompts for each variable based on the DE manual.
The generated initial meta-prompt is here
This section includes notebooks for developing metaprompts using different methods.
development_of_metaprompt_with_chat_prompting.ipynb: Develops metaprompt using the chat prompting method.development_of_metaprompt_with_chat_prompting_modified.ipynb: Develops metaprompt using the chat prompting method (modified version).development_of_metaprompt_with_one_by_one_n_shots.ipynb: Develops metaprompt using the one-by-one n-shot prompting method.development_of_metaprompt_with_conventional_n_shots.ipynb: Develops metaprompt using the cnventional n-shot prompting method.
| Method | Directory |
|---|---|
| Contextual Chat prompting | 2_contextual_chat_prompting |
| Contextual Chat prompting (modified) | 2_contextual_chat_prompting_modified |
| One‑by‑one n‑shots | 2_one_by_one_n_shots |
| Conventional n‑shots | 2_conventional_n_shots |
This section is dedicated to data extraction processes.
data_extraction.ipynb: Extracts data for all variables at once (All-in-one data extraction).data_extraction_modified.ipynb: Extracts data using modified methods, including re-check and re-extract prompting, re-extract prompting and batch data extraction.data_extraction_additional_o3.py: Extracts data using o3-high–based methods.
The extracted data is stored here
This section focuses on the evaluation of extracted data.
arm_matching.ipynb: Matches names of arms extracted by GPT with those extracted by humans.value_checker.ipynb: Check whether the value extracted by human matches the value extracted by GPT.metric_calculation_with_precision.ipynb: Calculates accuracy, sensitivity, specificity and precision.metric_calculation_with_variable_detection_comprehensiveness.ipynb: Calculates accuracy, sensitivity, specificity and variable detection comprehensiveness.
As mentioned in our article, the datasets (1, 2, and 3) include many copyrighted research papers. While data extraction from these works is permissible for research purposes, releasing the full datasets here would constitute copyright infringement. Consequently, the complete datasets are not distributed in this repository. Researchers who wish to access them can contact us.
@misc{kataoka2024automating,
author = {KATAOKA, Yuki},
title = {Automating the Data Extraction Process for Systematic Reviews using GPT-4o},
year = {2024},
url = {https://osf.io/cqg8u},
note = {Retrieved October 19, 2024}
}