ResearchClaw

A self-hosted AI research assistant for academic paper reading and analysis. Talk to it in chat to search, download, and parse papers from arXiv and OpenAlex, generate structured reading reports, ask follow-up questions about specific papers, and build a persistent knowledge base with long-term memory. Accessible from a web console or Feishu.

Features

Conversational paper workflow — search papers by title, arXiv ID, or keyword; confirm candidates; trigger download and parsing; generate reading reports — all through natural chat
Structured reading reports — the analysis subagent reads the full paper sequentially and produces a three-part report covering Story & Method, Experiments & Findings, and Summary & Critique
Paper Q&A — ask follow-up questions about an already-analyzed paper; the Q&A subagent answers using the existing report and targeted chunk retrieval without re-reading the full paper
TeX-first parsing pipeline — attempts arXiv TeX source first, falls back to local OCR (OpenAI-compatible multimodal VLM) or LlamaParse; all parsed content is chunked and embedded for retrieval
Persistent memory — the agent stores and recalls user preferences, project context, and domain knowledge across sessions
Paper library — all ingested papers, parse jobs, artifacts, and reports are stored locally and browsable in the web console
Web + Feishu — the same backend serves both a web console and a Feishu bot; each channel maintains its own conversation thread over a shared paper knowledge base and memory store
OpenAI-compatible providers — any chat and embedding provider that implements the OpenAI API (OpenAI, DeepSeek, local vLLM / Ollama, etc.) can be configured without code changes

Quick Start

1. Database

ResearchClaw requires PostgreSQL 16+ with the pgvector extension. Choose either the Docker method or the native installation method.

Option A — Docker (recommended)

The repository includes a docker-compose.yml for a one-command setup:

docker compose up -d postgres

Option B — Native PostgreSQL installation

If you prefer not to use Docker, install PostgreSQL and the pgvector extension directly.

Ubuntu / Debian:

# Install PostgreSQL 16
sudo apt install -y postgresql-16 postgresql-contrib-16

# Install pgvector
sudo apt install -y postgresql-16-pgvector

# Start the service
sudo systemctl enable --now postgresql

macOS (Homebrew):

brew install postgresql@16
brew services start postgresql@16

# Install pgvector
brew install pgvector

Create the database user and database:

sudo -u postgres psql <<'SQL'
CREATE USER researchclaw WITH PASSWORD 'researchclaw';
CREATE DATABASE researchclaw OWNER researchclaw;
\c researchclaw
CREATE EXTENSION IF NOT EXISTS vector;
SQL

Set the connection URL in .env (the default value matches the Docker setup; change it to match your native install if different):

RESEARCHCLAW_DATABASE_URL=postgresql://researchclaw:researchclaw@localhost:5432/researchclaw

Run database migrations (after starting PostgreSQL for the first time, or after pulling updates):

cd backend
uv run alembic upgrade head

Reset to a clean state (drops all data and re-runs migrations):

# Docker
docker compose exec postgres psql -U researchclaw -d postgres -c "DROP DATABASE researchclaw;"
docker compose exec postgres psql -U researchclaw -d postgres -c "CREATE DATABASE researchclaw;"

# Native
psql -U researchclaw -d postgres -c "DROP DATABASE researchclaw;"
psql -U researchclaw -d postgres -c "CREATE DATABASE researchclaw;"

cd backend && uv run alembic upgrade head && cd ..
rm -rf data/files/papers/*

2. Environment variables

Copy the example file and fill in the required values:

cp .env.example .env

Required — no defaults:

Variable	Description
`RESEARCHCLAW_CHAT_BASE_URL`	Base URL of your OpenAI-compatible chat provider (e.g. `https://api.openai.com/v1`)
`RESEARCHCLAW_CHAT_API_KEY`	API key for the chat provider
`RESEARCHCLAW_CHAT_MODEL`	Model name to use (e.g. `gpt-4o`, `deepseek-chat`)
`RESEARCHCLAW_EMBEDDING_BASE_URL`	Base URL of your OpenAI-compatible embedding provider
`RESEARCHCLAW_EMBEDDING_API_KEY`	API key for the embedding provider
`RESEARCHCLAW_EMBEDDING_MODEL`	Embedding model name (e.g. `text-embedding-3-small`)

The chat and embedding provider can be the same endpoint if your provider supports both.

3. Install `uv` and `pnpm`

uv (Python package manager):

curl -LsSf https://astral.sh/uv/install.sh | sh

pnpm (Node package manager):

npm install -g pnpm

4. Install dependencies and start

Install all dependencies at once:

npm run setup:all

Start backend and frontend together:

npm run dev

Frontend: http://127.0.0.1:5173
Backend: http://127.0.0.1:8000

Ctrl+C stops both.

Or start individually:

# Backend
cd backend && uv run uvicorn researchclaw.main:app --host 127.0.0.1 --port 8000 --reload

# Frontend
cd frontend && pnpm dev

Optional Configuration

Feishu bot

To use ResearchClaw through Feishu, create a Feishu app and set:

RESEARCHCLAW_FEISHU_ENABLED=true
RESEARCHCLAW_FEISHU_APP_ID=cli_xxx
RESEARCHCLAW_FEISHU_APP_SECRET=xxx
RESEARCHCLAW_FEISHU_VERIFICATION_TOKEN=xxx # no need for long_connection mode
RESEARCHCLAW_FEISHU_RECEIVE_MODE=long_connection   # recommended for local deployment; no public URL needed

long_connection mode connects outbound to Feishu's WebSocket endpoint and works on localhost without any port forwarding. Set RESEARCHCLAW_FEISHU_RECEIVE_MODE=webhook if you are running on a server with a public URL.

Local OCR parser (Logics-Parsing-v2)

For papers where arXiv TeX source is unavailable, ResearchClaw can use a locally deployed or cloud-hosted OpenAI-compatible multimodal VLM to perform OCR page by page. The backend renders each PDF page as an image and sends it to the configured endpoint.

RESEARCHCLAW_LOCAL_OCR_BASE_URL=http://127.0.0.1:8001/v1   # your VLM endpoint
RESEARCHCLAW_LOCAL_OCR_API_KEY=EMPTY
RESEARCHCLAW_LOCAL_OCR_MODEL=Logics-Parsing

The endpoint must accept OpenAI-compatible multimodal chat requests (image_url content). Any QwenVL-compatible service works.

LlamaParse

Cloud-based PDF parser from LlamaIndex as a fallback after local OCR:

RESEARCHCLAW_LLAMA_PARSE_API_KEY=llx-xxx

Get a key at cloud.llamaindex.ai.

OpenAlex API key

OpenAlex is used for paper metadata and citation lookup. Anonymous access works but is rate-limited. Providing an email-based API key (free) gets a higher quota:

RESEARCHCLAW_OPENALEX_API_KEY=mailto:you@example.com

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
backend		backend
data		data
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ResearchClaw

Features

Quick Start

1. Database

Option A — Docker (recommended)

Option B — Native PostgreSQL installation

2. Environment variables

3. Install `uv` and `pnpm`

4. Install dependencies and start

Optional Configuration

Feishu bot

Local OCR parser (Logics-Parsing-v2)

LlamaParse

OpenAlex API key

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ResearchClaw

Features

Quick Start

1. Database

Option A — Docker (recommended)

Option B — Native PostgreSQL installation

2. Environment variables

3. Install uv and pnpm

4. Install dependencies and start

Optional Configuration

Feishu bot

Local OCR parser (Logics-Parsing-v2)

LlamaParse

OpenAlex API key

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3. Install `uv` and `pnpm`

Packages