A self-hosted AI research assistant for academic paper reading and analysis. Talk to it in chat to search, download, and parse papers from arXiv and OpenAlex, generate structured reading reports, ask follow-up questions about specific papers, and build a persistent knowledge base with long-term memory. Accessible from a web console or Feishu.
- Conversational paper workflow — search papers by title, arXiv ID, or keyword; confirm candidates; trigger download and parsing; generate reading reports — all through natural chat
- Structured reading reports — the analysis subagent reads the full paper sequentially and produces a three-part report covering Story & Method, Experiments & Findings, and Summary & Critique
- Paper Q&A — ask follow-up questions about an already-analyzed paper; the Q&A subagent answers using the existing report and targeted chunk retrieval without re-reading the full paper
- TeX-first parsing pipeline — attempts arXiv TeX source first, falls back to local OCR (OpenAI-compatible multimodal VLM) or LlamaParse; all parsed content is chunked and embedded for retrieval
- Persistent memory — the agent stores and recalls user preferences, project context, and domain knowledge across sessions
- Paper library — all ingested papers, parse jobs, artifacts, and reports are stored locally and browsable in the web console
- Web + Feishu — the same backend serves both a web console and a Feishu bot; each channel maintains its own conversation thread over a shared paper knowledge base and memory store
- OpenAI-compatible providers — any chat and embedding provider that implements the OpenAI API (OpenAI, DeepSeek, local vLLM / Ollama, etc.) can be configured without code changes
ResearchClaw requires PostgreSQL 16+ with the pgvector extension. Choose either the Docker method or the native installation method.
The repository includes a docker-compose.yml for a one-command setup:
docker compose up -d postgresIf you prefer not to use Docker, install PostgreSQL and the pgvector extension directly.
Ubuntu / Debian:
# Install PostgreSQL 16
sudo apt install -y postgresql-16 postgresql-contrib-16
# Install pgvector
sudo apt install -y postgresql-16-pgvector
# Start the service
sudo systemctl enable --now postgresqlmacOS (Homebrew):
brew install postgresql@16
brew services start postgresql@16
# Install pgvector
brew install pgvectorCreate the database user and database:
sudo -u postgres psql <<'SQL'
CREATE USER researchclaw WITH PASSWORD 'researchclaw';
CREATE DATABASE researchclaw OWNER researchclaw;
\c researchclaw
CREATE EXTENSION IF NOT EXISTS vector;
SQLSet the connection URL in .env (the default value matches the Docker setup; change it to match your native install if different):
RESEARCHCLAW_DATABASE_URL=postgresql://researchclaw:researchclaw@localhost:5432/researchclawRun database migrations (after starting PostgreSQL for the first time, or after pulling updates):
cd backend
uv run alembic upgrade headReset to a clean state (drops all data and re-runs migrations):
# Docker
docker compose exec postgres psql -U researchclaw -d postgres -c "DROP DATABASE researchclaw;"
docker compose exec postgres psql -U researchclaw -d postgres -c "CREATE DATABASE researchclaw;"
# Native
psql -U researchclaw -d postgres -c "DROP DATABASE researchclaw;"
psql -U researchclaw -d postgres -c "CREATE DATABASE researchclaw;"
cd backend && uv run alembic upgrade head && cd ..
rm -rf data/files/papers/*Copy the example file and fill in the required values:
cp .env.example .envRequired — no defaults:
| Variable | Description |
|---|---|
RESEARCHCLAW_CHAT_BASE_URL |
Base URL of your OpenAI-compatible chat provider (e.g. https://api.openai.com/v1) |
RESEARCHCLAW_CHAT_API_KEY |
API key for the chat provider |
RESEARCHCLAW_CHAT_MODEL |
Model name to use (e.g. gpt-4o, deepseek-chat) |
RESEARCHCLAW_EMBEDDING_BASE_URL |
Base URL of your OpenAI-compatible embedding provider |
RESEARCHCLAW_EMBEDDING_API_KEY |
API key for the embedding provider |
RESEARCHCLAW_EMBEDDING_MODEL |
Embedding model name (e.g. text-embedding-3-small) |
The chat and embedding provider can be the same endpoint if your provider supports both.
uv (Python package manager):
curl -LsSf https://astral.sh/uv/install.sh | shpnpm (Node package manager):
npm install -g pnpmInstall all dependencies at once:
npm run setup:allStart backend and frontend together:
npm run dev- Frontend:
http://127.0.0.1:5173 - Backend:
http://127.0.0.1:8000
Ctrl+C stops both.
Or start individually:
# Backend
cd backend && uv run uvicorn researchclaw.main:app --host 127.0.0.1 --port 8000 --reload
# Frontend
cd frontend && pnpm devTo use ResearchClaw through Feishu, create a Feishu app and set:
RESEARCHCLAW_FEISHU_ENABLED=true
RESEARCHCLAW_FEISHU_APP_ID=cli_xxx
RESEARCHCLAW_FEISHU_APP_SECRET=xxx
RESEARCHCLAW_FEISHU_VERIFICATION_TOKEN=xxx # no need for long_connection mode
RESEARCHCLAW_FEISHU_RECEIVE_MODE=long_connection # recommended for local deployment; no public URL neededlong_connection mode connects outbound to Feishu's WebSocket endpoint and works on localhost without any port forwarding. Set RESEARCHCLAW_FEISHU_RECEIVE_MODE=webhook if you are running on a server with a public URL.
For papers where arXiv TeX source is unavailable, ResearchClaw can use a locally deployed or cloud-hosted OpenAI-compatible multimodal VLM to perform OCR page by page. The backend renders each PDF page as an image and sends it to the configured endpoint.
RESEARCHCLAW_LOCAL_OCR_BASE_URL=http://127.0.0.1:8001/v1 # your VLM endpoint
RESEARCHCLAW_LOCAL_OCR_API_KEY=EMPTY
RESEARCHCLAW_LOCAL_OCR_MODEL=Logics-ParsingThe endpoint must accept OpenAI-compatible multimodal chat requests (image_url content). Any QwenVL-compatible service works.
Cloud-based PDF parser from LlamaIndex as a fallback after local OCR:
RESEARCHCLAW_LLAMA_PARSE_API_KEY=llx-xxxGet a key at cloud.llamaindex.ai.
OpenAlex is used for paper metadata and citation lookup. Anonymous access works but is rate-limited. Providing an email-based API key (free) gets a higher quota:
RESEARCHCLAW_OPENALEX_API_KEY=mailto:you@example.comMIT