Findpapers is a Python library that gives researchers unified access to hundreds of millions of academic papers from different databases - all through a single query. Instead of searching the databases one by one, each with its own interface and query language, Findpapers lets you write one boolean expression and run it everywhere at once, automatically merging and deduplicating the results.
Findpapers searches for papers through arXiv, IEEE Xplore, OpenAlex, PubMed, Scopus, and Semantic Scholar - together covering virtually every peer-reviewed paper, preprint, and conference proceeding published across all fields of science. It also supports paper enrichment, PDF downloading, citation graph building (snowballing), and export to multiple formats.
- Massive coverage - access hundreds of millions of papers across six databases that together span every scientific discipline
- Multi-database search - query all databases in parallel with one boolean search expression - no need to learn six different query syntaxes
- Smart deduplication - automatically merges duplicate papers found across different databases
- Paper enrichment - fetch additional metadata (abstracts, keywords, citations) via CrossRef and web scraping
- PDF downloading - download PDFs with automatic URL resolution for major publishers
- Citation snowballing - build citation graphs by traversing references and citations (forward and backward)
- Flexible export - save results as JSON, BibTeX, or CSV
- Filter codes - restrict search terms to specific fields (title, abstract, keywords, author, source, affiliation)
- Parallel execution - speed up searches and downloads using multiple worker threads
- Python 3.11+
pip install git+https://github.com/jonatasgrosman/findpapers.gitimport findpapers
import datetime
engine = findpapers.Engine()
# Search for papers across all databases
result = engine.search(
"[machine learning] AND [healthcare]",
since=datetime.date(2022, 1, 1),
)
# Enrich papers with additional metadata (abstracts, keywords, citations)
engine.enrich(result.papers)
# Download PDFs
engine.download(result.papers, "./pdfs")
# Build a citation graph from the top results
graph = engine.snowball(result.papers[:5], max_depth=1, direction="both")
# Save results
findpapers.save_to_json(result, "results.json")
findpapers.save_to_bibtex(result.papers, "references.bib")
findpapers.save_to_json(graph, "citation_graph.json")The table below summarizes each supported database - for full details on authentication, rate limits, and per-database quirks, see the Databases documentation.
| Database | Size (papers) | API Key | Coverage |
|---|---|---|---|
| arXiv | 3M+ ¹ | Not required | Open-access preprints in physics, math, CS, biology, economics, and more |
| IEEE Xplore | 7M+ ² | Required | Journals, conferences, and standards in electrical engineering and CS |
| OpenAlex | 243M+ ³ | Optional | The largest open catalog of scholarly works across all disciplines |
| PubMed | 40M+ ⁴ | Optional | Biomedical and life sciences literature (MEDLINE, PMC, and more) |
| Scopus | 100M+ ⁵ | Required | Peer-reviewed literature in science, technology, medicine, social sciences, and humanities |
| Semantic Scholar | 214M+ ⁶ | Optional | AI-powered academic graph covering all fields of science |
Estimated paper counts were consulted in March 2026 from each database's official website. Click the superscript links for the original sources. These numbers grow continuously.
Every API key from the databases listed above can be obtained at no cost - just create an account on each provider’s website. We strongly recommend getting all of them before using Findpapers, as they unlock additional databases (IEEE, Scopus) and dramatically improve rate limits and reliability on the others (OpenAlex, PubMed, Semantic Scholar). See Databases for more details on how to get these API keys, and Configuration for how to set them up.
| Document | Description |
|---|---|
| Getting Started | Installation, configuration, and first search |
| Databases | Supported databases, authentication, and per-database details |
| Query Syntax | How to write search queries, boolean operators, wildcards, and filter codes |
| Configuration | Environment variables, proxy, SSL, and API keys |
| Search | Multi-database search with boolean queries |
| Enrich | Enrich papers with additional metadata from CrossRef and web scraping |
| Download | Download PDFs for papers |
| Snowball | Build citation graphs via forward and backward snowballing |
| Fetch by DOI | Look up a single paper by DOI |
| Save/Load | JSON, BibTeX, and CSV persistence details |
| API Reference | Public classes, functions, enums, and exceptions |
See the contribution guidelines if you'd like to contribute to the project. Please follow our Code of Conduct. You don't need to know how to code to contribute, even improving documentation is a valuable contribution.
If this project has been useful for you, please share it with your friends and give us a star on GitHub to help others discover it. You can also sponsor me to support the development of Findpapers.
If you use Findpapers in your research, please cite it:
@misc{grosman2020findpapers,
title={{Findpapers: A tool for helping researchers who are looking for related works}},
author={Grosman, Jonatas},
howpublished={\url{https://github.com/jonatasgrosman/findpapers}},
year={2020}
}
