📚 Monthly Literature Recommender Agent

This Python script acts as a personalized literature recommender, designed to keep researchers updated with new and relevant papers from Semantic Scholar based on their past work. It leverages semantic embeddings and large language models (LLMs) to identify, rank, and summarize papers, delivering a concise HTML digest.

✨ Features

Personalized Paper Discovery: Automatically fetches recent papers related to your research interests by analyzing your past publications on Semantic Scholar.
Semantic Ranking: Ranks candidate papers by their abstract's cosine similarity to a "centroid" embedding of your previous work, ensuring high relevance.
AI-Powered Summarization: Utilizes an LLM (via LiteLLM) to generate concise summaries, key findings, and a relevance rating for the top-ranked papers.
Elegant HTML Digest: Produces a beautifully styled HTML file (monthly_digest.html) for easy reading and sharing.
Embedding Caching: Employs ChromaDB to cache paper embeddings, speeding up subsequent runs and reducing API calls.
Robust API Handling: Includes retry mechanisms for API calls and handles Semantic Scholar API rate limits with fallbacks.

⚙️ How It Works

Fetch Seed Papers: The script first identifies a set of your most recent papers from Semantic Scholar using your S2_AUTHOR_ID.
Discover Candidate Papers: For each seed paper, it queries Semantic Scholar's recommendations API (or falls back to citations API) to find related papers published within a specified recency window.
Build Author Centroid: It computes a "centroid" vector in the embedding space. This centroid is the average of the embeddings of your past paper abstracts, representing your core research interests.
Rank Candidates: Each candidate paper's abstract is embedded, and its cosine similarity to your author centroid is calculated. Papers are then ranked by this similarity score.
Summarize Top Papers: The top-ranked papers are passed to an LLM (e.g., openai/gpt5mini via LiteLLM) which generates a concise summary, extracts up to 5 key findings, and assigns a relevance rating (0-5).
Generate HTML Digest: Finally, all the summarized papers are compiled into a single, well-formatted HTML file, monthly_digest.html, ready for review.

🚀 Getting Started

Prerequisites

Python 3.8+
pip package manager

Installation

Clone the repository:

git clone https://github.com/your-username/monthly-recommender-agent.git
cd monthly-recommender-agent

Install dependencies:
```
pip install -r requirements.txt
```
(If you don't have a requirements.txt, create one with the following content and then run the command above):
```
requests
numpy
tqdm
openai
chromadb
python-dateutil
litellm
langchain-community
```

Configuration

Set the following environment variables. You can do this by creating a .env file in the project root or by setting them directly in your shell.

S2_AUTHOR_ID: Your Semantic Scholar Author ID. You can find this by searching for your profile on Semantic Scholar and extracting the ID from the URL (e.g., https://www.semanticscholar.org/author/Your-Name/YOUR_AUTHOR_ID).
OPENAI_API_KEY: Your API key for OpenAI. This is used by LiteLLM to access the LLM for summarization.
S2_API_KEY (Optional): Your Semantic Scholar API key. While not strictly required for basic usage, providing one can increase API rate limits and improve reliability for larger fetches.

Running the Agent

Simply execute the Python script:

python monthly_recommender_agent.py

🔧 Configuration Options

AUTHOR_ID = os.environ["S2_AUTHOR_ID"] # Semantic Scholar id
NUM_CANDIDATES = 10 # pull this many candidate papers
NUM_FINAL = 10 # keep this many for final digest
NUM_SEED_PAPERS = 10 # number of your recent papers to use as seeds for finding related work
RECENCY_MONTHS = 1 # only consider candidate papers from last N months
EMB_MODEL = "all-mpnet-base-v2" # Embedding model used by HuggingFaceEmbeddings
LLM_MODEL = "openai/gpt5mini" # LLM model used via LiteLLM for summarization
SEMANTIC_URL = "https://api.semanticscholar.org/graph/v1" # Semantic Scholar API base URL

ChromaDB setup for caching embeddings

CHROMA_DB_PATH = "./chroma_db" # Directory for persistent ChromaDB storage

NUM_CANDIDATES: The maximum number of candidate papers to fetch per seed paper.
NUM_FINAL: The number of top-ranked papers to include in the final digest.
NUM_SEED_PAPERS: The number of your own recent papers used to establish your research centroid and find related work.
RECENCY_MONTHS: Only papers published within this many months from the current date will be considered as candidates.
EMB_MODEL: Specifies the HuggingFace embedding model used (all-mpnet-base-v2 is a good default).
LLM_MODEL: The LLM model identifier to be used by LiteLLM for summarization (e.g., openai/gpt5mini, openai/gpt-4-turbo, ollama/llama3).
CHROMA_DB_PATH: The local directory where ChromaDB will store cached embeddings.

📄 Output

The script generates a file named monthly_digest.html in the same directory it's run from. Open this file in any web browser to view your personalized literature digest.

🤝 Contributing

Contributions are welcome! Feel free to open issues or pull requests for bug fixes, new features, or improvements.

📜 License

This project is open-source and available under the MIT License.

Acknowledgements

Semantic Scholar: For providing the academic paper data and APIs.
OpenAI: For powerful language models used in summarization.
LiteLLM: For simplifying LLM API calls across various providers.
HuggingFace: For providing robust embedding models.
ChromaDB: For efficient and persistent vector database capabilities.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
monthly_recommender_agent.py		monthly_recommender_agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📚 Monthly Literature Recommender Agent

✨ Features

⚙️ How It Works

🚀 Getting Started

Prerequisites

Installation

Configuration

Running the Agent

🔧 Configuration Options

ChromaDB setup for caching embeddings

📄 Output

🤝 Contributing

📜 License

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

orcunyildiz/the-lit-digest

Folders and files

Latest commit

History

Repository files navigation

📚 Monthly Literature Recommender Agent

✨ Features

⚙️ How It Works

🚀 Getting Started

Prerequisites

Installation

Configuration

Running the Agent

🔧 Configuration Options

ChromaDB setup for caching embeddings

📄 Output

🤝 Contributing

📜 License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages