Skip to content

haodong2000/arXivClaw

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

27 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

arXivClaw: Free, Fully Automated arXiv Recommender ๐Ÿค–

๐Ÿ‘ค Authors: GPT-5.3-Codex, Haodong Li

arXivClaw is a daily arXiv recommender that fetches new papers, scores relevance, and sends digest emails.

In default settings (fetching 500 latest arXiv papers per weekday, generally sufficient for about 3 categories; LLM_MODEL=gemini-3.1-flash-lite-preview), arXivClaw is free.

โœจ What this tool does

  • Fetches new papers from arXiv for your selected categories/query
  • Scores each paper using keywords + title + abstract
  • Sorts papers by score (high to low)
  • Uses threshold-first delivery with fallback minimum count:
    • if papers above MIN_RELEVANCE_SCORE are greater than MIN_DAILY_PUSH_COUNT, sends all above-threshold papers (no upper limit)
    • otherwise sends top MIN_DAILY_PUSH_COUNT papers by score
  • Runs automatically at 2:00 PM (by default) Los Angeles time on weekdays
  • Sends one startup/init email when the process starts, including a brief explanation of key runtime settings

๐Ÿ“ฎ Startup Confirmation Email (an example) โฌ‡๏ธ start

๐Ÿ“ฎ Daily Digest Email (an example) โฌ‡๏ธ start

0) ๐Ÿงพ Understand .env.example vs .env

  • .env.example: template file with explanations and placeholder values (never put real secrets here)
  • .env: your real local config with API keys and SMTP password (ignored by Git)

First-time setup:

cp .env.example .env

Then edit only .env and replace values marked as required.

1) ๐Ÿ› ๏ธ Clone and install

git clone https://github.com/haodong2000/arXivClaw.git
cd arXivClaw

Then install dependencies:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2) ๐Ÿ“ง Configure email delivery (e.g., Brevo)

2.1 ๐Ÿ”‘ Create SMTP credentials

  1. Sign up on Brevo.
  2. Go to SMTP & API and create an SMTP key.
  3. Fill these values in .env:
    • SMTP_HOST=smtp-relay.brevo.com
    • SMTP_PORT=587
    • SMTP_USER=<your Brevo SMTP login> (usually *@smtp-brevo.com)
    • SMTP_PASSWORD=<your Brevo SMTP key>

2.2 ๐Ÿ“จ Set sender and recipient

  • EMAIL_FROM: must be a valid/verified sender in Brevo
  • EMAIL_TO: where you want to receive the digest

Recommended for easier delivery:

  • Use a personal inbox as recipient first (for example *@gmail.com) instead of school/work email systems.
  • Add your Brevo sender address (the actual From address shown in Brevo logs) to your Contacts/Safe Senders list.

Note: Free-plan limits and policies may change. Always check the latest Brevo dashboard information.

Quota reminder: Before large runs, verify your Brevo sending limits and remaining quota.

3) ๐Ÿง  Configure LLM (e.g., Google Gemini)

  1. Create an API key in Google AI Studio.
  2. Set these values in .env:
LLM_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai
LLM_API_KEY=YOUR_GEMINI_API_KEY
LLM_MODEL=gemini-3.1-flash-lite-preview
  1. Set your interests:
KEYWORDS=agent, video generation, world model, LLM, VLM
MIN_RELEVANCE_SCORE=50
MIN_DAILY_PUSH_COUNT=50

Quota reminder: Check your Gemini API quota/rate limits in Google AI Studio before increasing ARXIV_MAX_RESULTS.

4) ๐Ÿงช First test run (recommended)

In .env, set:

  • RUN_ONCE=true
  • ARXIV_MAX_RESULTS=5 (small and cheap test)
  • ARXIV_TIMEOUT_SECONDS=30 (recommended if your network is slow)
  • ARXIV_MAX_RETRIES=3

Run:

PYTHONPATH=src python main.py

When RUN_ONCE=true, the app will:

  • Enable verbose debug logs automatically
  • Ignore state.db (no dedup and no run persistence)

5) โฐ Production scheduled mode

In .env, set:

  • RUN_ONCE=false
  • RUN_HOUR=14 (24-hour format)
  • RUN_MINUTE=0
  • TIMEZONE=America/Los_Angeles
  • INIT_EMAIL_ON_STARTUP=true (set false if you do not want startup confirmation email)

Run:

PYTHONPATH=src python main.py

6) โœ… Mail allowlist (if emails are not arriving)

If logs say sent but inbox is empty, delivery is often blocked on the receiver side.

Try these steps:

  1. Check Spam/Junk folders.
  2. Add the Brevo sender address to Contacts and Safe Senders.
  3. Add sender domain allowlist when available: brevosend.com.
  4. For school/work mail systems, contact IT to allowlist at gateway level.

7) โ“ FAQ

  • Digest email sent appears but no email received:

    • App-side sending usually succeeded.
    • Check Brevo Transactional Logs for final status (delivered, blocked, bounced).
  • No scoring logs appear:

    • In normal mode (RUN_ONCE=false), already-processed papers are deduplicated.
    • Use RUN_ONCE=true when debugging.
  • ReadTimeout appears when fetching arXiv:

    • Increase ARXIV_TIMEOUT_SECONDS (for example 60 or 90).
    • Reduce ARXIV_MAX_RESULTS (for example 100 for daily runs).
    • Keep ARXIV_MAX_RETRIES at 3 or higher for unstable networks.

8) ๐Ÿ—‚๏ธ Project structure

.
โ”œโ”€โ”€ main.py
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ .env.example
โ””โ”€โ”€ src/arxivclaw
        โ”œโ”€โ”€ config.py
        โ”œโ”€โ”€ models.py
        โ”œโ”€โ”€ pipeline.py
        โ”œโ”€โ”€ clients
        โ”‚   โ”œโ”€โ”€ arxiv_client.py
        โ”‚   โ”œโ”€โ”€ llm_client.py
        โ”‚   โ””โ”€โ”€ email_client.py
        โ””โ”€โ”€ storage
                โ””โ”€โ”€ state_store.py

About

Free, Fully Automated arXiv Recommender ๐Ÿค–

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages