Name	Name	Last commit message	Last commit date
parent directory ..
data	data
src	src
README.md	README.md

Setup and Running Guide

paper2lkg has been tested functional on a 2021 MacBook Pro with an M1 Max chip and a AMD Ryzen 5 5600X 6-Core Processor, 16 GB RAM, and GeForce RTX 3080 12GB running Pop! OS 22.04 LTS. It should be compatible with most Macs and PCs running Linux.

Step 1: OS Environment

For Linux users with a Nvidia GPU:

Install the latest Nvidia driver.
Install cuda toolkit 12.6.

For Mac users:

Install the latest macOS.

It is not recommended to run the pipeline on CPU or GPUs other than the Nvidia ones.

Step 2: Python Environment

Install Python 3.12.8 using Anaconda

Install the following packages:

conda install numpy
conda install nltk
conda install pip
pip install gpt4all
pip install -U FlagEmbedding
pip install openai
pip install rdflib
pip install voyageai (to run the t2_kg_to_text part of paper2lkg-v0-testers)

Step 3: Running paper2lkg

If you want to run paper2lkg using GPT, create a JSON file call api_key.json under ./src/utilities. Then, move the two files llm_response_handler_JSON_list.py and llm_response_handler_JSON.py from ./src/utilities/gpt to ./src/utilities.

If you want to run paper2lkg using LLaMA, move the two files llm_response_handler_JSON_list.py and llm_response_handler_JSON.py from ./src/utilities/llama to ./src/utilities.

If you are using a Mac to run paper2lkg with LLaMA, ensure the file device in the function initialise_llm() in the files llm_response_handler_JSON_list.py and llm_response_handler_JSON.py is set to gpu. If you are using a Nvidia CPU, ensure it is set to cuda.

Prepare the paper your want to transform to local KG in ./data/input. There are already 10 full paper and 100 abstract-only preprocessed into the acceptable format, i.e., the Deep Document Model (DDM) representation. However, if you only have the original paper, e.g., a PDF file, you should turn it into a DDM first. You can either do it automatically or manually. To preprocess documents into DDM automatically, please to the DDM repository for details. To convert a document into DDM manually, go to /data/input and follow the three steps. You should first create a JSON file like this example, where a document should have been decomposed into a hierarchy of sections, paragraphs, and sentences along with its metadata. To do so, you should use the tool in /data/input/step_1_building_DDM. This tool leverage LLM to convert any unstructured section of a paper into an hierarchy paragraphs and sentence. However, you should manually construct each section of the DDM as well as the metadata. After constructing the initial DDM, run ./data/input/step_2_assigning_IRI/IRI_assigner.ipynb to assign each node in the DDM an IRI. After this step, you paper should be ready to input into the pipeline.

To start paper2lkg, run ./src/main.py or ./src/main_batch.py to process multiple papers.

The output local KG of the paper is in ./data/input. It is in either the JSON or TTL formats.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

Setup and Running Guide

Step 1: OS Environment

Step 2: Python Environment

Step 3: Running paper2lkg

Uh oh!

FilesExpand file tree

paper2lkg-v0

Directory actions

More options

Directory actions

More options

Latest commit

History

paper2lkg-v0

Folders and files

parent directory

README.md

Setup and Running Guide

Step 1: OS Environment

Step 2: Python Environment

Step 3: Running paper2lkg