Goldfranks GROBID Reference Extraction

A Python pipeline for extracting and analyzing references from academic PDFs using GROBID.

Features

Start GROBID Docker:

docker run --rm --init --ulimit core=0 -p 8070:8070 grobid/grobid:0.8.2

Install dependencies:

pip install requests pandas tqdm lxml PyMuPDF

Run the pipeline:

cd reference_extraction/scripts
python master_pipeline.py --test  # Test mode
python master_pipeline.py         # Process all PDFs

This project is for personal research use only.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
reference_extraction		reference_extraction
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md