GitHub - jonatasgrosman/findpapers: Findpapers: A tool for helping researchers who are looking for related works

Findpapers is a Python library that gives researchers unified access to hundreds of millions of academic papers from different databases - all through a single query. Instead of searching the databases one by one, each with its own interface and query language, Findpapers lets you write one boolean expression and run it everywhere at once, automatically merging and deduplicating the results.

Findpapers searches for papers through arXiv, IEEE Xplore, OpenAlex, PubMed, Scopus, and Semantic Scholar - together covering virtually every peer-reviewed paper, preprint, and conference proceeding published across all fields of science. It also supports paper enrichment, PDF downloading, citation graph building (snowballing), and export to multiple formats.

Key Features

Massive coverage - access hundreds of millions of papers across six databases that together span every scientific discipline
Multi-database search - query all databases in parallel with one boolean search expression - no need to learn six different query syntaxes
Smart deduplication - automatically merges duplicate papers found across different databases
Paper enrichment - fetch additional metadata (abstracts, keywords, citations) via CrossRef and web scraping
PDF downloading - download PDFs with automatic URL resolution for major publishers
Citation snowballing - build citation graphs by traversing references and citations (forward and backward)
Flexible export - save results as JSON, BibTeX, or CSV
Filter codes - restrict search terms to specific fields (title, abstract, keywords, author, source, affiliation)
Parallel execution - speed up searches and downloads using multiple worker threads

Requirements

Python 3.11+

Installation

pip install git+https://github.com/jonatasgrosman/findpapers.git

Quick Start

import findpapers
import datetime

engine = findpapers.Engine()

# Search for papers across all databases
result = engine.search(
    "[machine learning] AND [healthcare]",
    since=datetime.date(2022, 1, 1),
)

# Enrich papers with additional metadata (abstracts, keywords, citations)
engine.enrich(result.papers)

# Download PDFs
engine.download(result.papers, "./pdfs")

# Build a citation graph from the top results
graph = engine.snowball(result.papers[:5], max_depth=1, direction="both")

# Save results
findpapers.save_to_json(result, "results.json")
findpapers.save_to_bibtex(result.papers, "references.bib")
findpapers.save_to_json(graph, "citation_graph.json")

Supported Databases

The table below summarizes each supported database - for full details on authentication, rate limits, and per-database quirks, see the Databases documentation.

Database	Size (papers)	API Key	Coverage
arXiv	3M+ ¹	Not required	Open-access preprints in physics, math, CS, biology, economics, and more
IEEE Xplore	7M+ ²	Required	Journals, conferences, and standards in electrical engineering and CS
OpenAlex	243M+ ³	Optional	The largest open catalog of scholarly works across all disciplines
PubMed	40M+ ⁴	Optional	Biomedical and life sciences literature (MEDLINE, PMC, and more)
Scopus	100M+ ⁵	Required	Peer-reviewed literature in science, technology, medicine, social sciences, and humanities
Semantic Scholar	214M+ ⁶	Optional	AI-powered academic graph covering all fields of science

Estimated paper counts were consulted in March 2026 from each database's official website. Click the superscript links for the original sources. These numbers grow continuously.

Every API key from the databases listed above can be obtained at no cost - just create an account on each provider’s website. We strongly recommend getting all of them before using Findpapers, as they unlock additional databases (IEEE, Scopus) and dramatically improve rate limits and reliability on the others (OpenAlex, PubMed, Semantic Scholar). See Databases for more details on how to get these API keys, and Configuration for how to set them up.

Documentation

Document	Description
Getting Started	Installation, configuration, and first search
Databases	Supported databases, authentication, and per-database details
Query Syntax	How to write search queries, boolean operators, wildcards, and filter codes
Configuration	Environment variables, proxy, SSL, and API keys
Search	Multi-database search with boolean queries
Enrich	Enrich papers with additional metadata from CrossRef and web scraping
Download	Download PDFs for papers
Snowball	Build citation graphs via forward and backward snowballing
Fetch by DOI	Look up a single paper by DOI
Save/Load	JSON, BibTeX, and CSV persistence details
API Reference	Public classes, functions, enums, and exceptions

Want to help?

See the contribution guidelines if you'd like to contribute to the project. Please follow our Code of Conduct. You don't need to know how to code to contribute, even improving documentation is a valuable contribution.

If this project has been useful for you, please share it with your friends and give us a star on GitHub to help others discover it. You can also sponsor me to support the development of Findpapers.

Citation

If you use Findpapers in your research, please cite it:

@misc{grosman2020findpapers,
  title={{Findpapers: A tool for helping researchers who are looking for related works}},
  author={Grosman, Jonatas},
  howpublished={\url{https://github.com/jonatasgrosman/findpapers}},
  year={2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 549 Commits
.github		.github
docs		docs
findpapers		findpapers
tests		tests
typings/pytest		typings/pytest
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
logo.png		logo.png
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
support.gif		support.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Key Features

Requirements

Installation

Quick Start

Supported Databases

Documentation

Want to help?

Citation

About

Uh oh!

Releases 1

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Key Features

Requirements

Installation

Quick Start

Supported Databases

Documentation

Want to help?

Citation

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages