An interactive Telegram bot that searches arXiv for research papers, generates summaries, and delivers comprehensive research digests directly to your Telegram. This project is a research automation tool designed to streamline the discovery and analysis of academic literature. By interfacing with the arXiv API, the parser automates the retrieval of metadata and PDFs, transforming unstructured search results into a structured dataset for downstream NLP tasks, literature reviews, or trend analysis.
- ๐ Intelligent Search: Uses LLM to generate optimized arXiv search queries
- ๐ Paper Parsing: Extracts and parses full paper content from arXiv
- ๐ค AI Summaries: Generates section-by-section and general paper summaries
- ๐ Research Digest: Creates a comprehensive digest of all relevant papers
- ๐ฌ Interactive Bot: Conversational interface via Telegram
- ๐ Continuous Running: Always available, handles multiple users
- ๐ Deduplication: Tracks processed papers to avoid redundant work
- ๐ Access Control: Restrict bot to specific Telegram user IDs
pip install -r requirements.txtCreate a .env file in the parser/ directory:
# Telegram Configuration
TELEGRAM_BOT_TOKEN=your_bot_token_here
UID=your_default_user_id_here
# Admin User ID (user with admin privileges to manage other users)
ADMIN_USER_ID=123456789
# Bot Access Control (comma-separated list of allowed Telegram user IDs)
# Leave empty to allow everyone, or specify user IDs to restrict access
ALLOWED_USER_IDS=123456789,987654321
# AI Model Configuration (for Gemini, optional)
API_KEY=your_gemini_api_key_hereTo get a Telegram Bot Token:
- Open Telegram and search for @BotFather
- Send
/newbotand follow the instructions - Copy the token provided
To get your User ID:
- Search for @userinfobot on Telegram
- Start a chat and it will show your user ID
To Restrict Bot Access (Recommended):
- Get the Telegram user IDs of all authorized users (using @userinfobot)
- Add them to
ALLOWED_USER_IDSin.envas a comma-separated list - Example:
ALLOWED_USER_IDS=123456789,987654321,555666777 - Leave empty or omit to allow anyone to use the bot (not recommended for production)
Edit parser/settings.py to choose your model:
# For Gemini (requires API_KEY in .env)
model_name = 'gemini-2.0-flash-001'
# OR for local Ollama (requires Ollama running)
model_name = "llama3.1:latest"If using local models:
# Install Ollama from https://ollama.ai
# Then pull your desired model:
ollama pull llama3.1:latestcd parser
python telegram_bot.pyThe bot will start and display:
๐ค Bot is starting...
๐ Send /start to your bot to begin!
- Start a Search: Send
/startto your bot in Telegram - Enter Topic: Type your research topic (e.g., "RAG", "Transformer models")
- Enter Start Date: Provide the start date in format:
YYYY.MM.DD- Example:
2024.10.20(October 20, 2024)
- Example:
- Enter End Date: Provide the end date in format:
YYYY.MM.DD- Example:
2024.10.24(October 24, 2024)
- Example:
- Wait for Results: The bot will process papers and send you a digest
- Repeat: Send
/startanytime for a new search
Date Format Features:
- Simple, readable format:
YYYY.MM.DD - Automatic validation with helpful error messages
- Checks that end date is after start date
- Automatically converts to arXiv format (midnight to end-of-day)
User Commands:
/start- Begin a new research search/cancel- Cancel current search setup/help- Show help message
Admin Commands (only available to user with ADMIN_USER_ID):
/add_user- Add a new authorized user/remove_user- Remove an authorized user/list_users- List all authorized users
You: /start
Bot: ๐ Welcome to arXiv Research Assistant!
What research topic are you interested in?
You: RAG
Bot: โ
Topic received: RAG
๐
Now let's set the date range.
๐ Enter the START date in format: YYYY.MM.DD
You: 2025.08.01
Bot: โ
Start date received: 2025.08.01
๐ Now enter the END date in format: YYYY.MM.DD
You: 2025.08.02
Bot: ๐ Starting research process!
๐ Topic: RAG
๐
Date Range: 2025.08.01 to 2025.08.02
๐ Searching: [202508010000+TO+202508022359]
โณ This may take a few minutes...
[Bot processes papers and sends digest...]
Bot: โ
Research complete!
Send /start to begin a new search!
Date Validation Examples:
Invalid format:
You: 2025/08/01
Bot: โ Invalid date format! Please use: YYYY.MM.DD
Invalid date range:
You: (start) 2025.08.05
(end) 2025.08.01
Bot: โ Invalid date range! End date cannot be before start date.
- Query Construction: LLM generates an optimized arXiv search query from your topic
- Paper Discovery: Searches arXiv API for matching papers in the time range
- Content Extraction: Fetches and parses full paper text from arXiv HTML
- Section Summaries: Generates AI summaries for each paper section
- General Summary: Creates a comprehensive summary of each paper
- Digest Generation: Synthesizes all papers into a cohesive research digest
- Telegram Delivery: Sends the digest with clickable paper links
parser/
โโโ telegram_bot.py # Interactive bot interface (NEW!)
โโโ main.py # Core processing pipeline
โโโ feed_parser.py # arXiv API integration
โโโ text_parser.py # Paper content extraction
โโโ summaries.py # AI summary generation
โโโ telegram_notify.py # Telegram messaging utilities
โโโ llm.py # LLM integration
โโโ prompt_library.py # AI prompts
โโโ date_parser.py # Date parsing utilities
โโโ settings.py # Configuration
โโโ papers.json # Paper database
To run a one-time search without the bot:
cd parser
python main.pyEdit the hardcoded values in main.py:
user_prompt = "RAG" # Your research topic
time_range = "[202508010000+TO+202508020000]" # Date range- Check that
TELEGRAM_BOT_TOKENis correct in.env - Verify the bot is running:
python telegram_bot.py - Check the console for error messages
- For Gemini: Verify
API_KEYis set and valid - For Ollama: Ensure Ollama is running:
ollama serve - Check that the model specified in
settings.pyis available
- The bot automatically retries with a fallback query if the LLM-generated query fails
- Check your internet connection
- ArXiv API may have rate limits
- Some papers may not have HTML versions available
- The bot will use the abstract as a fallback
By default, the bot can be restricted to specific Telegram users:
Configuration:
# In .env file
ADMIN_USER_ID=123456789
ALLOWED_USER_IDS=123456789,987654321,555666777Behavior:
- If
ALLOWED_USER_IDSis set: Only listed users can use the bot - If
ALLOWED_USER_IDSis empty/not set: Anyone can use the bot (โ ๏ธ not recommended)
What happens when unauthorized user tries:
Unauthorized User: /start
Bot: ๐ซ Access Denied
Sorry, you are not authorized to use this bot.
Your user ID: 999888777
Please contact the bot administrator.
Logs:
- Authorized access:
INFO: Authorized user started session: 123456789 (@username) - Unauthorized attempts:
WARNING: Unauthorized access attempt by user_id: 999888777
How to find User IDs:
- Send message to @userinfobot on Telegram
- It will reply with your user ID
- Add that ID to
ALLOWED_USER_IDSin.env
The bot includes admin commands to manage user access dynamically without editing the .env file:
Setup:
- Set
ADMIN_USER_IDin your.envfile to your Telegram user ID - Ensure you're in the
ALLOWED_USER_IDSlist as well
Admin Commands:
Adding a user:
Admin: /add_user
Bot: ๐ค Add New User
Please enter the Telegram User ID you want to authorize.
๐ก Tip: Users can find their ID by sending any message to @userinfobot
Enter the user ID (numbers only):
Admin: 987654321
Bot: โ
User Added Successfully
User ID 987654321 has been authorized.
Total authorized users: 3
Removing a user:
Admin: /remove_user
Bot: ๐ค Remove User
Current authorized users:
โข 123456789
โข 555666777
โข 987654321
Please enter the User ID you want to remove:
Admin: 555666777
Bot: โ
User Removed Successfully
User ID 555666777 has been removed from authorized users.
Total authorized users: 2
Listing users:
Admin: /list_users
Bot: ๐ฅ Authorized Users List
Total: 2 user(s)
1. 123456789
2. 987654321
Features:
- Changes are written to the
.envfile immediately - Bot updates its internal user list in real-time (no restart needed)
- Admin cannot remove themselves from the authorized list
- All admin actions are logged for security
Security Notes:
- Only the user with the
ADMIN_USER_IDcan execute admin commands - Non-admin users attempting admin commands will receive an access denied message
- User additions/removals are logged with timestamps
Edit settings.py:
SEMAPHORE_LIMIT = 20 # Concurrent LLM requestsTelegram has a 4096 character limit. The bot automatically splits long messages:
tg_notify_multiple(text, max_length=4000)Feel free to:
- Report bugs
- Suggest features
- Submit pull requests
- Improve documentation
- Papers are stored in
papers.jsonto avoid reprocessing - The bot can handle multiple users simultaneously
- Each user's session is tracked independently
- Processing time depends on the number of papers and LLM speed
- Natural language date input ("last week", "yesterday")
- Custom arXiv categories filtering
- Export to PDF/Markdown
- Scheduled automatic searches
- Knowledge graph generation
Built with โค๏ธ for researchers who want to stay current with arXiv